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110. (Amended) A transformed maize seed which has been transformed with a 
plant polynucleotide to express a polypeptide in the endosperm of the 
transformed maize seed, wherein the transformed maize seed exhibits an 
elevated level of lysine or a sulfur-containing amino acid compared to a 
corresponding non-transformed maize seed. 

REMARKS 

Reconsideration of the present application is respectfully requested. 

Claims 76-79, 90-93 and 95-1 1 1 are pending in the application. As discussed 
in detail below, the claims have been amended to delete certain words objected to 
by the Examiner. 

Claim 104 is rejected under 35 USC 112, first paragraph, as containing 
subject matter which was not described in the specification in such a way as to 
reasonably convey to one skilled in the relevant art that the inventors, at the time the 
application was filed, had possession of the claimed invention. The Examiner states 
that there does not appear to be support in the specification for the specific mole % 
recited in the claim. 

The Examiner's attention is drawn to page 6, lines 14-21 of the present 
application. High lysine content protein and high sulfur content protein are described 
in the specific terms found in claim 104. However, in order to expedite prosecution 
claim 104 has been amended to delete "to about 50 mole %" and "to about 40 mole 
%". u At least" has been added before about 7 mole % and about 6 mole %. Support 
for the amendment is found in the same location in the application. 

Claims 76-79, and 90-93 remain rejected and new claims 95-111 are rejected 
under 35 USC 112, first paragraph, as containing subject matter which was not 
described in the specification in such a way as to reasonably convey to one skilled in 
the relevant art that the inventors, at the time the application was filed, had 
possession of the claimed invention. 
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The rejection is respectfully traversed. The arguments in the previous 
responses are maintained. The Examiner states that although the specification 
refers to other wild type polypeptides, Applicant does not describe other modified 
nucleic acids nor plants comprising said nucleic acids that have increased lysine or 
sulfur-containing amino acids. The Examiner invites Applicants to submit copies of 
references published prior to the filing date of the present application that teach 
other nucleic acid molecules that could be used in the claimed method to increase 
lysine or sulfur containing amino acids in plants. 

As discussed in detail below, numerous wild-type and modified 
polynucleotides are disclosed in the application and are also known in the art. 
Copies of publications in addition to those previously provided are submitted with 
this response. 

The Examiner states that it is improper to incorporate essential material by 
reference and that the Applicant has not satisfied the written description 
requirement. 

It is respectfully submitted that particular polynucleotide sequences are not 
critical to the broad claims. In fact it would be impossible to submit all possible 
sequences that could be used in the claims. Claim 78 calls for a polynucleotide that 
encodes HT12 or ESA. These sequences were filed with the original application as 
SEQ ID NOS: 2 and 6 respectively as discussed below. 

As requested by the Examiner, copies of references discussed below will be 
provided unless they were already submitted in a 1449 form. The location of the 
polynucleotide sequences can readily be determined in the various publications. 
These references demonstrate the skill in the art with regard to polynucleotides that 
encode proteins with elevated levels of lysine or sulfur-containing amino acids. If 
additional publications are needed, they can be provided by Applicant. 

With regard to the ESA nucleic acid, the sequence is found in SEQ ID NO: 6, 
(2199-2675) (see Table 2, page 40, of the present application). The hordothionin 
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(HT) SEQ ID NO: 1, (3361-2947), high lysine hordothionin (HT12) SEQ ID NO: 2 
(3361-2947) and the high lysine chymotrypsin inhibitor gene (also called barley high 
lysine gene or BHL) SEQ ID NO. 7 (2199-2450) are found in the sequences filed and 
identified in Table 2 of the present application. Additional HT12 sequence 
modifications are found in SEQ ID NOS: 10-13. 

In addition numerous suitable genes were known in the art, many identified in 
the application. The Examiner is familiar with the Rao patents as they were cited in 
1449 forms. US Ser. No. 08/838,763 cited on page 8, line 23 of the present 
application is now US Pat. No. 5,990,389, cited on a 1449 form as A1 8. US Ser. No. 
08/824,379 cited on page 8, line 24 of the present application is now US Pat. No. 
5,885,801 cited on a 1449 form as A20. US Ser. No. 08/824,382 cited on page 8, 
line 24 of the present application is now US Pat. No. 5,885,802, cited on a 1449 form 
as E2. The 10 kD zein storage protein from maize is disclosed in Kirihara et al. 
1988, Mol. Gen. Genet. 211: 477-484, a copy of which is enclosed. Sulfur-rich 10 kD 
rice prolamin is disclosed in Masumura et al., Plant Mol. Biol. 12: 123-130, 1989, 
(A25 on the 1449 form and cited on page 13, lines 7-8 of the present application, 
SEQ ID NOS: 20-21). The maize gene encoding methionine-rich 15 kD zein protein 
is found in Pedersen et al., J. Biol. Chem., 261, 6279-6284 (1986), (A26 on the 1449 
form and cited on page 13, lines 5-6 of the present application, SEQ ID NOS: 16- 
17). The gene encoding the Brazil nut protein is found in Altenbach et al., Plant Mol. 
Biol., 8: 239 (1987), a copy of which is included. The gene encoding a high 
methionine maize 10 kD zein is found in Kirihara et al., Gene, 7, 359-370 (1988), 
(A22 on the 1449 form submitted and cited on page 13, lines 6-7 of the present 
application). Pea genes encoding high sulfur protein are disclosed in Higgins et al., 
J. Biol. Chem., Vol. 261, No. 24, pp. 11124-111310(1986), (A21 on the 1449 form 
and cited on page 12, lines 6-7 of the present application, SEQ ID NOS: 14-15). A 
gene encoding a methionine rich sunflower protein is found in Lilley, et al., 
Proceedings of the World Congress on Vegetable Protein Utilization in Human 
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Foods and Animal Feedstuffs; Applewhite, T.H. (ed.), American Oil Chemists Soc, 
Champaign, IL, pp. 497-502 (1989), (A23 on the 1449 form and cited on page 13, 
lines 1-5 of the present application). 

Other suitable genes include 12S seed storage protein gene from rapeseed 
disclosed in Ryan et al., Nucleic Acids Res., 17 (9): 3584 (1989) a copy is enclosed. 
The sunflower 2S albumin gene is disclosed in Allen et al., Mol. Gen. Genet., 201 
(2): 21 1-218, (1987) a copy is enclosed. The maize albumin b-32 gene is disclosed 
in Di Fonzo et al., Mol. Gen. Genet, 212 (3): 481-487 (1988), a copy is enclosed. 
The napin gene is disclosed in Joseffson et al., J. Biol. Chem., 262 (25): 12196- 
12201 (1987) and Scofield and Couch, J. Biol. Chem., 262 (25): 12202-12208 
(1987) copies are enclosed. The B1 hordein gene is disclosed in Forde et al. 
Nucleic Acids Res. 13 (20): 7327-7339 (1985), a copy is enclosed. The wheat alpha 
and beta gliadin genes were described in Sumner-Smith et al., Nucleic Acids Res., 
13 (11): 3905-3916 (1985), a copy is enclosed. Wheat gliadin is also disclosed in 
Anderson et al., Nucleic Acids Res., 12(21): 8129-8144 (1984), a copy is enclosed. 
The pea legumin gene is disclosed in Lycett et al., Nucleic Acids Res., 12 (11): 
4493-4506, a copy is enclosed. Various maize zeins are disclosed in Heidecker and 
Messing, Nucleic Acids Res., 11 (14): 4891-906 (1983), copies are enclosed. The 
alpha, alpha', and beta-subunits of soybean 7S seed storage protein is disclosed in 
Schuler et al., Nucleic Acids Res., 10 (24): 8245-8261 (1982) and Schuler et al., 
Nucleic Acids Res., 10 (24) 8225-8244 (1982) copies are enclosed. The sunflower 
11S gene is described in Vonder Haar et al., Gene, 74 (2): 433-443 (1988), a copy is 
enclosed. The pea convicilin gene is disclosed in Bown et al., Biochem. J., 251 (3): 
717-726 (1988), a copy is enclosed. 

Claims 76-79, and 90-93 remain rejected and new claims 95-111 are rejected 
under 35 USC 112, first paragraph, because the specification is enabling only for 
claims limited to transformed cereal plant seed having an elevated lysine, 
methionine and cysteine content (about 10% to about 35%) by weight compared to 
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untransformed cereal plant seed) comprising the modified hordothionin gene of SEQ 
ID NO: 2 (HT12), vectors, plant cells and transformed plants comprising said 
modified hordothionin gene. The Examiner states that the specification does not 
enable any person skilled in the art to which it pertains, or with which it is most 
nearly connected to make and or use the invention commensurate in scope with 
these claims. 

The rejection is respectfully traversed. As discussed above, numerous useful 
genes are cited in the application. Many others were known at the time of filing. 
Further a 1.132 Declaration was submitted October 18, 1999 by Rudolf Jung, a co- 
inventor on the application. The results in the Declaration demonstrate significant 
increases in the level of methionine when using ESA as the polynucleotide. 
Increases in the level of methionine of up to 30 % were demonstrated. 

The Examiner states that claim 104 is not enabled for 50 mole % lysine or 40 
mole % sulfur. 

In order to simplify the claim and expedite prosecution, claim 104 has been 
amended to remove "50 mole % lysine" and "40 mole % sulfur". 

Claims 76-79, 90-93, and 95-111 are rejected under 35 USC 112, second 
paragraph, as being indefinite forfaiting to particularly point out and distinctly claim 
the subject matter which applicant regards as the invention. 

Claims 76 and 77 have been amended as suggested by the Examiner to 
recite "transformed cereal plant" rather than "transformed cereal plant seed". 

Claims 98 and 99 have not been amended in a similar fashion because there 
is no antecedent basis for "transformed cereal plant". 

The Examiner objects to the phrase "plant derived polynucleotide" claims 73, 
95-97, 104-108 as there are many types of derivatives and hence it is not known 
what is encompassed by derived. 
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The claims have been amended as suggested by the Examiner to remove 
"derived" from the claims. The amended claims read a "plant polynucleotide". The 
claims encompass plant polynucleotides as described throughout the specification. 

Claims 101 and 102 are objected to because of the phrase "about 10 times" is 
considered indefinite. The phrase has been removed to expedite prosecution. 

Claims 76-79 and 90-93 remain rejected and new claims 95-1 1 1 are rejected 
under 35 USC 102(e) as being anticipated by Falco et al. (U.S. Patent 5,773,691). 

The Examiner states that in view of the indefinite claim language "plant 
derived polynucleotide", it reads on essentially any polynucleotide, because any 
polynucleotide can be "derived" from a plant. As noted above, the claims have been 
amended to remove "derived". The amended claims require a "plant 
polynucleotide". 

The Examiner further states that Falco teaches plant polynucleotides in 
Example 20. 

It is noted that the LKR gene of Example 20 is an enzyme that is involved in 
lysine catabolism. In order to increase lysine one needs to suppress expression of 
the LKR. If LKR is expressed the level of lysine is decreased. Therefore, Example 
10 does not anticipate the present claims, which require expression of a polypeptide. 

Claims 76-79 and 90-93 remain rejected and new claims 95-1 1 1 are rejected 
under 35 USC 103(a) as being unpatentable over Rao et al. (US Patent 5,885,802) 
in view of Applicant's admission and also over Rao et al. (US Patent 5,990,389). 

The Examiner states that substitution of one promoter for another promoter is 
routine in the art. 

The rejection is traversed and the previous arguments are maintained. 
Namely, there is no motivation or suggestion in the art to use an endosperm 
preferred promoter or that it would produce beneficial results. 
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The Examiner states that the Falco teaching cannot be considered because 
Applicant has not cited a reference or a location in that reference for the quotation of 
Falco. The reference and location are cited below. 

In US 5,773,691, Example 26, Col. 88, Lines 34-41, Falco et al. state "No 
increase in free lysine was observed in seed expressing Corynebacterium DHDPS 
plus E. coli from the glutelin 2 promoter with or without AKIII-M4". Falco et al. further 
indicate that lysine catabolism is expected to be much greater in the endosperm 
than the embryo and this probably prevents the accumulation of increased levels of 
lysine in seeds expressing Corynebacterium DHDPS plus E. coli AKIII-M4 from the 
glutelin 2 promoter". 

The DHDPS gene expressed by glutelin 2 (an endosperm preferred promoter) 
did not increase lysine in the seed. Falco et al. concluded that lysine catabolism is 
greater in the endosperm, thus preventing an increase in lysine. Falco et al. 
therefore teach away from the present claims. The present claims require an 
endosperm preferred promoter and/or expression of a polypeptide in endosperm. 
The Supreme Court held in US v Adams, 383 US 39, 148 USPQ 479 (1966) that one 
important indicia of nonobviousness is "teaching away from the claimed invention by 
the prior art or by experts in the art at (and/or after) the time the invention was made. 
The decision maker must consider the prior art as a whole in making an obviousness 
rejection. Also see In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988). 
Teaching away from the art is a per se demonstration of lack of prima facie 
obviousness. There can be no expectation of success. The prior art as a whole 
must be considered. To proceed contrary to accepted wisdom is strong evidence of 
nonobviousness. in re Hedges, 228 USPQ 685, 687 (Fed. Cir. 1986). 

In 35 USC 103, the statue expressly requires that obviousness or 
nonobviousness be determined for the claimed subject matter as a whole. The 
results and advantages produced by claimed subject matter must be considered. As 
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discussed above, the results and advantages were not disclosed or suggested in the 
prior art. Diversitech Corp. v. Century Steps, Inc. 7 USPQ2d 1315 (Fed. Cir. 1988). 

The Examiner states that the motivation combining the elements of the 
present invention is provided in the Rao reference itself. The Examiner further 
states that Rao shows increases in amino acid composition in the seed (the major 
portion of which is the endosperm) with the constitutive promoter, one would have 
been motivated to substitute a seed-specific, or endosperm-specific promoter to 
further increase or to limit increases to the seed/endosperm tissue. The Examiner 
concludes that it would have been an obvious modification to substitute an 
endosperm-specific promoter. 

It is again emphasized that there must be some motivation to make the 
particular claimed combination. There are many possible types of promoters to 
choose from. There was no motivation to choose endosperm preferred promoters. 

Claims 76-79 and 90-93 remain rejected and new claims 95-111 are rejected 
under 35 USC 103(a) as being unpatentable over Jaynes et al. (US pat. 5,81 1,654) 
in view of Applicant's admission. The Examiner states that the teachings of Jaynes 
are clearly directed to increasing amino acid compositions in seed and that it would 
have been an obvious modification to substitute and endosperm-specific promoter. 

The rejection is respectfully traversed. Arguments in the previous responses 
are maintained. In particular, there is no suggestion or motivation to make the 
claimed combination. As discussed in detail above Falco teaches away from the 
using an endosperm-specific promoter. Based on the prior art at the time of filing, 
one would have no expectation of success when using an endosperm preferred 
promoter to increase the level of amino acid in a seed. 

Attached hereto is a marked-up version of the changes made to the 
specification by the current amendment. The attached page is captioned "Version 
with markings to show changes made." 
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In view of the above comments and amendments, withdrawal of the 
outstanding rejections and allowance of the remaining claims is respectfully 
requested. 
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VERSION WITH MARKINGS TO SHOW CHANGES MADE 
In the claims : 

76. (Twice amended) The method of claim 95, wherein the transformed cereal 
plant [seed is from] is maize, wheat, rice, or sorghum. 

77. (Twice Amended) The method of claim 76 wherein the transformed cereal 
plant [seed is from] is maize or sorghum. 

95. (Amended) A method for increasing the level of lysine or a sulfur-containing 
amino acid in a cereal plant seed, the method comprises transforming a 
cereal plant cell with an expression cassette and regenerating a transformed 
cereal plant to produce a transformed cereal plant seed, wherein the 
expression cassette comprises a seed endosperm-preferred promoter 
operably linked to a plant [derived] polynucleotide encoding a polypeptide, 
and wherein expression of the polypeptide increases the level of lysine or a 
sulfur-containing amino acid in the transformed cereal plant seed compared 
to a corresponding non-transformed cereal plant seed. 

96. (Amended) The method of claim 95 wherein the seed endosperm-preferred 
promoter is heterologous to the plant [derived] polynucleotide. 

97. (Amended) A transformed cereal plant seed which has been transformed with 
a plant [derived] polynucleotide to express a polypeptide in endosperm of the 
transformed cereal plant seed, wherein the transformed cereal plant seed 
exhibits an elevated level of lysine or a sulfur-containing amino acid 
compared to a corresponding non-transformed cereal plant seed. 
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101. (Amended) The transformed cereal plant seed according to claim 100 
wherein the amount of lysine or sulfur-containing amino acid in the 
transformed cereal plant seed is increased at least about 15 percent by 
weight [to about 10 times] compared to a corresponding non-transformed 
cereal plant seed. 

102. (Amended) The transformed cereal plant seed according to claim 101 
wherein the amount of lysine or sulfur-containing amino acid in the 
transformed cereal plant seed is increased at least about 20 percent by 
weight [to about 10 times] compared to a corresponding non-transformed 
cereal plant seed. 

104. (Amended) An expression cassette comprising a seed endosperm-preferred 
promoter operably linked to a plant [derived] polynucleotide encoding a 
polypeptide having at least about 7 mole % [to about 50 mole %] lysine or at 
least about 6 mole % [to about 40 mole %] of a sulfur containing amino acid. 

105. (Amended) The expression cassette of claim 104 wherein the seed 
endosperm-preferred promoter is heterologous to the plant [derived] 
polynucleotide. 

106. (Amended) A seed from a transformed cereal plant which has been 
transformed with a plant [derived] polynucleotide to express a polypeptide in 
the endosperm of the transformed cereal plant seed, wherein the transformed 
cereal plant seed exhibits an elevated level of lysine or a sulfur-containing 
amino acid compared to a corresponding non-transformed cereal plant seed. 
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107. (Amended) A method for increasing the level of lysine or a sulfur-containing 
amino acid in a maize seed, the method comprises transforming a maize cell 
with an expression cassette and regenerating a transformed maize plant to 
produce a transformed maize seed, wherein the expression cassette 
comprises a seed endosperm-preferred promoter operably linked to a plant 
[derived] polynucleotide encoding a polypeptide, and wherein expression of 
the polypeptide increases the level of lysine or a sulfur-containing amino acid 
in seed of the transformed maize plant compared to seed of a corresponding 
non-transformed maize plant. 

108. (Amended) The method of claim 107 wherein the seed endosperm-preferred 
promoter is heterologous to the plant [derived] polynucleotide. 

110. (Amended) A transformed maize seed which has been transformed with a 
plant [derived] polynucleotide to express a polypeptide in the endosperm of 
the transformed maize seed, wherein the transformed maize seed exhibits an 
elevated level of lysine or a sulfur-containing amino acid compared to a 
corresponding non-transformed maize seed. 
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Differential expression of a gene 

for a methionine-rich storage protein in maize 
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Summary. A methionine-nch 10 kDa zein storage protein 
r rom mai/e was isolated and the sequence oi the Vtermmal 
\o amino acids was determined. Based on the amino acid 
sequence, mo mixed oligonucleotides were synthesized and 
used to probe a maize endosperm cDNA library A mil- 
length cDN A clone encoding die 10 kDa zem was isolated 
nvlhis procedure. The nucleotide sequence oi the c D N A 
done predicts a polypeptide of 124 ammo acids, preceded 
bv a signal peptide of 21 ammo acids. The predictea poi>- 
peptide is unique in its extremely high content o! methio- 
nine (22.5° o). The maize inbred line BSSSo3. which has 
increased seed methionine due to overproduction oi this 
protein, was compared to W2\?. a standard inbred line. 
Northern blot analysis showed that the relative RNA leveU 
for the 10 kDa zein were enhanced in developing seed, ol 
BSSS-53. providing a molecular basis for the overproduc- 
tion of the protein. Southern blot analysis indicated that 
there are one or two 10 kDa zem genes in the maize genome. 

Key words: Zem - Zen mars - Gene expression - Seed 
development - High methionine protein 



Introduction 

The expression of seed storage protein genes is tis.Mie-specil- 
icand developmental regulated. These genes are expressed 
oniv during defined sta-es of seed development, and tne 
expression is limited to the emhr>o and or endosperm tissue 
of developing seeds. In agriculturally important seed crops 
the expression of storage protein genes directly allcets the 
nutritional uualttv of the seed protein. In maize \Zctt nuns 
L.) the prolamine i/ci;i) fraction of storage pouem> otii- 
nrises o\cr 50",, of the total protein in the mature seed. 
Zem polypeptides contain extremely lou levels ol the essen- 
tial ammo acids lysine, tryptophan and. to a lesser extent, 
methionine Maize seed protein is deficient m these ammo 
acids because such a large percentage of the iota! protein 
is contributed b\ the /ems. Several mutations m mai/e a(- 
fect the expression of zein genes and result m improved 
nutritional quahtv of die seed protein. For example, m the 
seeds of plants homozygous for the recessive mutation 

* /v. ( „< ,,/,/,•:■ . KallcMad. \ DiviMon of hrhanioni. MM >" I ake 
Ha/clunc Or . Uhaska. \| \ ^Ms. I'SA 



„^i,:ii\lat/ci al. l^di.thcieveK oi the M 
(-kDai /ems arc Jrasticalh reduced 'Misra el al o 
Soave etal. l^b>. There n a concomitant increase ;n m: 
proportion of more numuonaliv balanced protein dem- 
ited m the seed. The lie: result is an. mcrea -e m ^eveis 
,r, in Mile and trvmonhan In the see-- : Misra d ai. L' ^ 

The inbred line BSSS-53 w as emir actcri/cu n> - 
methionine content 30% Ingner man thai o. othe: mm— 
hnes tested (Phillip- et al. m\ >. It was later Tiown Phillips 
and YleClure i^i that the increased methionine content 
in BSSS-- ,eeds was the result of a twofold increase m 
the level of the methionine-rich 1U kDa zem storage protein 
fraction The other zein subfractions weie present m levels 
comparable to those found m other mbreo lines, ana the 
total pnvem content and kernel phenol ype were norma, 
Ammo acid unalvsis indicated that the 10 kDa zem taction 
was composed of approximate!} 20" <> methionine. 

We are investisatinsz the differential exoressicn oi me 
10 kDa zem m BSSS-53 compared to othe' maize^trams. 
Due to ±c hinh methionine content ol the Hi kDa zem. 
and since methionine is specified by a unique triplet .-don 
iATGi. die following approach was taken to isolate a 
cDNA slone encoding this polypeptide. A o> kDa zem po- 
lypeptide was isolated, and the sequence ol the Vtermmai 
so amino acids was determined. Based oi. the ammo acic 
scuuence two mixed oligonucleotides were Ann.eMzeo and 
used to .creen a mai/e endosperm cDNA horary A uill- 
lemnh cDN A clone encodmg the 10 kDa 'cm \wm isolated 
b x Iho procedure. We report here die purification and - 
terminal amino, avid sequence of the !<» U u /cm p-/-:> pep- 
tide, and the nucleotide sequence oi the iD v\ ciouc 

nve this pro i cm . , 

"The 10 kDa /cm is distinguished bv m. extremes nigh 



The inereas 



express^. in oi 



methionine content i22.?"f> 
the 10 1- Da /em protein m BSSS-53 sound =.o =x- c <•-- 
hied wnh elevated levels of 10 kDa zem Rk\ in the endo- 
sperm of developing seeds. Southern Not analysis o, maize 
-enomic DNA indicated that the 10 kDa /em suMraetion 
'is encoded bv one or two structural genes 



Materials and methods 



L.i inbred lines 



Plani nnncnai. Seeds of mai/e i 7 mov 
WMA. V\ 23 and BSSS-53 were kindly provided ] w K.I.. 
Phillips. Dept. of Agronomv. l/mversitv ot Minnesota. St 
MS 5 5 M)S. rS-V Lmdospcrm samples were obtained 
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■('c/fu'O^ I" 



\ \\ Messing 



t rom se 



:eds of hand-pollinated plants grown in the Meld 
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in F'No Fcal sample^ w ere obtained Iron: seedlmez erowi 
in c ei ow l\i chamber. 



below 2. idenl: ! ic.i ' *' a i-. w ,'iv !!-. r attemp 



\<un !/,»;. z^an p;.. KM!'. 'VVhMi- wci; isokucu as 
described bv Phillips and \L( ,uiv i vv }» ri ,r LMn eoncen- 
t rations were determined agams: a. n. unie scraim albumin 
standard ^ ur\ e accordi ue in the an 
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SDS~/)n/\ tU r\ I'jninli z< ' ( a f n>pimt\ wm//^'/v( /, w 'oe/o- 
"/'e. SDS polvacruamide ge; elect roph. ucsi , ; SDS-PA v F i 
was earned out aecordniLi u> the metnoo oi i.acmmn ( P>~0; 
Separating gels of I 5".. acrvlanude were tin !4i> mm ■ pre- 
parative gels were 3 mm thick and anaivticai gels were 
l.zmm thick. Proteins were visualized al'ier preparative 
SDS-PAGE bv so;ikiiii' the gel in 0.2;" M k- A 1 mM dith- 
tothreuol as described by Hager and Burgess i p>Mi; 
iytical SDS-PAGE gels were stained with Goomassic biuc. 

Isoelectric focusing (IFF) was performed on 2 mm slab 
gels using an LJvB Muhiphor apparatus IEF geF uerc 
5% acrvlamide. o.4 M urea, and contained 2'A pH -'< 
ampholytes 'Servar IEF gels were run at *2 W cousin: 
power for 2 h at 10- C and then at ! 5 \V for ?o min at 
the same temperature. Proteins were visualized alter IFF 
b\ >oaking the gel in F>G trichloroacetic acid :TC-; or 
by Goomassic staining. For preparative IFF. onh. j portion 
of the gel was treated with Mr 1 ,, TCA lo enable localization 
of the protein bands w ithin the remainder of the gei. 

Elation of proteins /mm po/vuervianude ■zeG. SDS-PAGE 
or IEF ge! slices that contained the protein bands of interest 
were minced and covered with SDS-gei electrophoresis 
buffer. The protein was then eiectroeiuied from the -ae! 
pieces. Eluted protein was diuiyzed extensive! 1 , against ~0'G 
ethanol and iyophihzed. Protein samples were further puri- 
fied by reverse phase HPEC (Mahonev and Hermodson 
1980). 

Ammo Lien! analysis. Samples were hydroly/ed at 1 10 c C 
in sealed, evacuated tubes with giass-distiiied. h N HCi for 
24 h. The protein was not reduced or alkylated. Anaivses 
were carried out on a Beck man System b3oo amine- acid 
analyzer. 

Amino acid sequence ana/vsis. Samples were degraded in 
a Beckman Model s i >0 D sequencer accord in- to tiie proce- 
dure oi Edman and Bcgg i IOpG using a dieht modification 
ot the Beckman o.l M Ouadro] peptide prouram (No. 
34r\X01». prior io the addition of the sample to the se- 
quencer cup. 2 mg of Pofvbrenc were dissolved in 0." ml 
of acetic acid and applied lo the cup of i he sequencer, 

dried under vacuum, and subjected to three complete eveies 
ot automated Edman degradation. The material under in- 
vestigation was then introduced into the cup and sequen- 
tially degraded. Products generated by the sequencer were 
converted to their phenv ithiohuianioin iPlh) derivatives as 
previous!} described I Mahonev and Nine PfXfo. Pth-amino 
acid derivatives were identified using a V a nan Mode 1 .VN) 
lernarv high performance liquid chromatograph. equipped 
with a Vanan Mode! soon autosampler modified for re- 
duced sample loss, a Hewlett-Packard ,- l >0 recording inte- 
grator and a Beckman 0.4^ - 25 cm l.'ltrasphere ( >DS : urn 
column (Zimmerman et a! 1«"~ : N u te and Mahonev pJNO) 
Pth-amino acids were identified based upon comparison 
with known standards. When the signal to noise ratio fell 
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nnniei: rnetiioc. deagneo 1"! use will", a-.iva;icv\i pUC plas 
' ■ ^Mtvpe-ge- and !. Puoens;em m preparation, 
1 he \ ec to!" Usee u 

vector E' \ ■ war digeaeo uitn /w^ ^ v 

D\ -\ wa- then dige-aei; \% nh IhinuW a • pi o \ ide a single 

■' sc irair^cnptase. . en nneroerams o[ 

^enatureo poi;, -\ i .\ ^ \, a -, annealed 
U: : L ^ ,: ceto: pni-ei m a ilrs: ^v;:,2 ■ ^-i reaction 
>h i^cvzeo v 1 - \ : ! v rv-erse -vivia^ Following 
— cone >l : ' Mu' s> iUiie-.is "Oi.a .ama ane. ocv L ' -2-. duplex 
cT.'N V' ec a was naet!.' iate.i wit;: L, mF :aet:nlase. li- 
gate^: to /; oR: linkers, and digested L'erRl. The entire 
population o; nuea:' . . v 
a ted on agarose geis. ...Ame d^ - me* n «. ie • > 
circLiianzed eD\ A-veetor D:\.-, irom :ne niuiMciual frac- 
tions was used to transform a Dr-F -Hanahan :M>Gi deriva- 
tive nearmg F Uic r Z: : Tnf Y ' s . I he rcuiting maize 
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ai-anan ( 1*^83). 



endosperm protein bod- 
eonsisted of 4.3 v lo n uvdependen: ^ione; 



" J " " ' iibra- dcsiea.aed PB-2. 



Srrcc.-un? the rD\.} ///vv-v Foior,;. ; ^v.,. ;; . ; ;.)n using 
svntnetic oiigonueieouce pr -oe- war. -e"ovna ; according 
to the prpiax:o! oi Wo i : os-t ... ■ >:v •.; eoiomes show- 
ing hybnuization on mpaca.-j replica inters ^ere chosen 
for further anaivsis, F nsitive eoiomes v-iv picked and col- 
onv-puntled Positive .aone a ere ■.e r i;i-j ' bv bndization 
of the oliooruicle >t.cir ptobes :,- Saa:.he;n -ioA of restric- 
tion enz; me-digesteci pla amd CNAs. 

lempiaU n>'cpa> u ti> >// ii,-ic r n>n \ieu i<>n;m: . mui [) \ t se- 
quencing Single-^tr m. led piasmio F»'\. \ for oeletion sub- 
cloning and P>\A -eouenciiv: was prepared a^ previousiv 
described iVieira and Me-,-a?ve ; o>~a v .et ot overlanpini: 
secjuentiai deletic-n subei«-nes i"r I ' s v^-juenc ae was pre- 
pareci as described bv Dak aa 'Ms 7 ; yrv \ v . jucnc , r ^ 
was can-led out b\ the vhoeow method 'Ytaec: et ah P r '^i 
with f - f Sid ATP using the proton ai lined m ... kit pur- 
chased from Aniersiiam. Mi ijmpiate- \ e r a -^equenced at 
least twice and the sequer;ce was determined or. Nab DNA 
st rands. 

Maize I)\ A and R\A ts.n'aHnic Genomic D\ A wa> iso- 
lated irom leaf tissue of 3-wcet. ok: mat/e seediuvas as de- 
scribed bv Shure et ai. ■ P'N.M. 

For KNA isolations. JiKiosperrn- were .lis ected from 
maize kernels harvested at specific times alter ^Gmation. 
The endosperms were fro/en :r liquid narogen and stored 
at -SO' d until needed. PnuoqxTm samples i't ; e) were 
ground to a line powder m liquid niiiouen. and nnai KNA 
was isolated as described bv Berrv et al. i ! 4 * N ; i 

Southern hint and fmnhcni hint analv\i\. Maize genomic 
DNA samples were digested with restriction enzvmes and 
fractionated on 0 S"„ agarose gels. After .tainmz and pho- 
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, ^ra^lv. . the DN ^ parualb dcpurmuicd A\ahi ei al 
'^cM and mm si erred n - N\ irai: mem t Mane i V hict.nei an,: 
Schue |l) ;;u'-M\!ir.L' io Southern : idci- were prcii>- 

nndized and i i> m idizcd avoiding m th. manmawiurc: 
^ecifica'J^n- H\hndi/ed tiller- were washed at ro.»m 

Jmperatmx lor \? mm m <> SS( . n .1".. SDV i>.o:".. -'d- 
pvrophomhaic iNaPPi) 1 1 ton twice at 3 v. ;or D mm 
i °\. SDS. n.i^'A NaPP: The final -mugem 



in 1 x SSC "./-'A M >V ".u— „ -,a:aa l i 
v , uS hvva.to, hUmmamo C m" i ^ • b'-SDS iM>V\. 

saPP'. , ... . 

Northern blot anabsis ot maize L'lidoMX'fni [o>ai K v-<. 

was earned .mi on 1.2".. agarose dormaidchodc gels Deua- 
'uration. elecii ophoresis and transfer of RNA were in- 
formed as described h> \lamali> e: a! il^2). except liuu 
Sytran membrane was used m place of nili ocelliiloM.-. 
Filters were prehvbridized and Inbrtdized according to tlie 
manufacturer's specifications. H\ ondized fillers were 
washed as described abo\e for Southern blot hvbndizaliou. 

The DN\ prohe used for Southern and Northern blot 
analvsis was a deletion subclone of lOkZ-i (see I ig. 4). des- 
ignated i0kZ-U4?.. !0kZ-;.l4e lack, the entire poh A urn 
and approximately >() nucleotide r- to the poh A - lail ^ )1 
H)kZ-l. ibLZ-U43 DNA wa> .abeied with i r ■ Tide ' T P 
(New EneTmd Nuclear, sihi U nimol 1 b\ nick translation 
iRmbv efai. A\erage specific acmit\ of ihe labeled 

probes was 1 * IT cpm ug. ITbndtzed filters were exposed 
io Kodak XAR-5 X-ray Vilm tor l-~2h at -xu'' C with 
a Dupont Cronex intensifying screen. 
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Results 

Protein /writ -cat inn 

Zein-1 and /ein-2 fractions were isolated from kerneN ol 
the maize inbred lines W64A and^BSSS-?? as described 
in Mat-rials and methods. SDS-PAOE anal> sis of the zem- 
2 fraction, from WMA and HSSS-5? demon urated that 
the 10 kDa zem was present m lueher proportion m BSSS- 
5< (Fi-. 1. compare lanes ; and 4). The 10 kDa zem sub- 
traction was isolated from these T wo inbred iine> b\ prepar- 
ative SOS-P AG1: (lanes 5 and c-i The H> kDa zem. fractions 
isolated from the two inbred lines were similar m ammo 
acid content liable 1). When N-iermmai amino acid se- 
quencing was attempted on SDS-I'-XGE-punfied i^ KDa 
zem from BSSS-NT it was hrnnd th:it this fraction was iiet- 
eroeeneouv and no single N-terminal sequence w;is .ib- 
tamed. The in kDa /em was then fractionated b> isoele:tnc 
focusm-j (fie 2 i and indeed several component;., were de- 
tected. The maior Iff band was purified b\ preparati\e 
1EF in po.Kaervlamule slab ceis. T'ne purified pop. peptide 
is shown in lanes " and ^\ b'gs i and 2. respeUPeA 
We uerc able to obtain a partial \-termmai ammo, acid 
sequence of this protein traction (big. , ; Al 

Amino iic ul scijucihc Liiwi\'si.\ 

The 1<) kDa /em protein presented problems due io its ! tic k 
ofsolubiiit\ in aqueous buffers Attempts a i reduction arid 
S-p\ndvlethvlation met with eMremeh low vieid^ (as deter- 
mined b\ ammo acid analyst!, with commensurate 
of material. As such, ammo acid sequencing was done m 
the absence ot reduction and alkviatiom knowing that tins 
w ol i!d no! allow the identification of cysteine As shown 
in Fie. / and Table 2. we were able to order the lirsi M) 
dmmo acids, with 5 ot the identifications in question The 




]■}-. 2. \na bneal isoeieeinc f*.cuMng 1 1 IT > iiei /eir. p-^ petMules 
/.civ, mictions (2d iOdu-) is.M.neei tram the mhretl hne^ WM-\ 
ana HSSS--< ^ecc anaK/ed b'. 1 1 I b'oteins were \isiiah/ed b> 
M,;tkaa.: tin ee: >,v. !n"„ H A. ana the ge> ^o P holagrapiK-u on 
.ka k -.,An_" van: voif aUe 'una;^ I .iins \ ana 2. /eir,-2 i r.u 
M „ir iroir \\*^ \ and ; ies P e,;^ei> . aaies ; and 4. Hi kiXi 

sc\r- :V^t;a!> :V.,ni W -\ ..iu! HSSs-<V vc-pecP v el> . purihetM^ 
SI^S-pab^rwaiTiide -el ele. : r. T Moreas . iane 5. IEF-pur:lied 
10 L IAt /j;n Aon; HSSS-^ fh/ anode wa> at the 'op 
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Tnr 

Cln 

1 1 
Met 
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Tnr - Net - Asn - < Tv r 1 - ( Pr o • - Met 



Pro 

25 JO 
-•vr - X££ - Met - Mil - ill; - C-ln -'Giv: Leu;- Ala 



Prop e H : 



Met Met Tyr Thr Met Met Gin 

ATG ATG TAPy ACN ATG ATG CAPu 
TAG TAG ATPu TGN TAC TAC GT 5 ' 



Pr.it . seq. 
Coding strand 
Oiigo. seq. 



Probe G_: 



Met Gin Tyr Thr Met Met Gin 
ATG CAPu TAPy ACN ATC ATG CAPu 
TAC GTPy ATPu TGN TAC TAC GT 



Prot. seq. 
Zodirx strand 
Oiiqo. seq. 
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i i;;. A v. B. a \Sc;mma. a:mm.. actc S.. .'' IDa /em 

pospcpiide sacsm :v>'.v. BsS.v-s* rsirmcc v "-.iec. ne s>cusme 



fiii<'>->iih'u residues indicate 



'cicn o; mmm- aoC residues 



thai ^as used tor the prcdichomeuiacd ontnesn >< ' m e^n ueleotide 
probes ^ t //'(7?.f/?('vt'v indicaie amim> tcid ''eoduc- one tentaliveh 
lacnntied Seuuericc microhes-ro^cnei ; . ■ v. as delected at residues 
12 ana 2! B Sequence* o: :1s 'Wi' ^n;he".; nr - j<J. o; >j. .'nucleotide 
probe-- . as deri'. ed :ron-; annno .iciv. --j.iir 1 ,;:. ;, a. murine. P\ . 
P v - rjmidine ; N . air '.xise 



amino terminal residue was identiheu as threonine in initial 
sequence analyses, and as glutamine in a subsequent anuh- 
sis; however, this was the onlv disagreement in the data. 
Two residues had more than a single unit residue: residue 
12 had both asparagine and proline, and residue 21 had 
both glutamine and methionine (Fin. 3 and Table 2} In 
the identification of threonine at residue 23. although the 
yield was poor, there appeared to be a small amount of 
deh \ d ro- 1 h reon i ne presen t. 

T'ne derived nucleotide sequence for the region between 
amine acid residues 20 and 2o was chosen for the svn thesis 
ol two mixed oligonucleotide probes of 20 nucleotides in 
length Gag. 3B). One of the probes (probe M) reflected 
the methionine at residue 21. while the second probe i probe 
C i > reflected the glutamine at this position. The oligonucleo- 
tides were designed to be complementary to the mRN-\ 
and therefore to the coding strand of the DNA so thai 
positive clones could be quickly verified by DNA sequenc- 
ing using the oligonucleotide probes as sequencing primers. 
The oligonucleotide probes were specific for the 10 kDa 
/em since the region chosen contained 3 (probe G) or 4 
(probe M) methionine residues. With the exception of ihe 
15 kDa /em. methionine is a rare G"u 2 ( \.) ammo acid 
in other /ems. The mature 15 kDa /em contains IS methio- 
nine residues (Marks el al. los^b. Pedersen et ai. l^Sfo but 
has no homology to the oligonucleotides. 

Screening ihc cl) \ \ library 

Southern blot analysis of piasmid DNA isolated from 0 
ol the S size fractions of the cDNA hbrarv indicated that 



traction 0 contained most of the sequences nvondiiring to 
the oligonucleotide probes (data not shown.i. Approximate- 
ly 20 000 colonies from fraction ^ vverc screened bv colony 
hybridization to the 2 mi\ed oligonucleotide probes. Ap- 
proximate!} 200 colonies siiouec strong n \ bridization to 
probe G alter washing ihe fibers at ; " C. Pi'one M hvbrid- 
lzed to the maiontv ol trie same colonies alter washing 
the filters at 25" C Howe\er. no Hybridization above back- 
ground was delected with prone M aiter the filters were 
washed ai 3"" C fhereiore. ^ n i ' o> -sit: \ e 'i<Mi'c-- detected 
witli probe <.i \^ ere chosen lor fu r: he; j \\\ '.' . - \ '. Amu probe 
Cj as a scquencmL' primer, e 'A-.a'e abie idcntih a clone 
which encoded a polvpepnde v\ith ii:c -ani; ammo acid 
>eq uence as i he iOkDazem : in s c lone, design,; tec l0kZ-I. 
v\as chosen lor complete L'N A sequence determination. 

A inuoi tiic scijucnn- tH iuhZ-I 

The DNA sequence of lo].7-l s shown i;- Fig. 4 Thi^ 
cDNA clone encodes a pol\pep!klc of \ ammo acids pre- 
ceded h> a leader peptide of 21 anm-,o , ; sc- T!ic -ate of 
cleavage of the leader peptide wa^ determined bv ci»mparing 
the ammo acid sequence c>t the mature lo kDa /em protein 
with the derived ammo acid sequence of the c D N A clone. 
As expected from the results ol die colors is bridi/ation. 
the cDN.A clone encodes i poi\ peptide with a glut am me 
rather than a methionine at residue 2! oi the aiatuie polv- 
pepnde. rhecDNA clone has 21 nucleotides - io the ATG 
and ^o nucleotides 3 to the "I AG stop coei^n. There i^ 
a consensus poi\ i A » addition signal iAA7"-\ \ Ai 25 nucleo- 
tides 5 lo the polvi .A i tail, similar to other eukarvotic genes 
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Table 2. N oLL and ulcnti i iuii u m of liic pi 
iU tomaieJ Uiin.ii i deei auatiou 



odik^ ecnciatcd h\ 



Pnsi tl 0 11 


Amu in > .iciC. 




i 


*.)!!! 


-s "i ^ 




1 ll . 






1 1 L 


■A) s 








4 


Pro 
( iL 






Ho 
Leu 




P 


s-; ; 


S 


Pro 


IN." 


7 


i Tiin 


a: 
.) 


1 1 \ 

!U 


\ a! 


: i 


' 1 L I 


201 




\sp P' . > 


_ 4 - 


J - 

, 


Leu 


21.1 


1 

;4 


w 1 \ 


i : 4 


1: 


T'hr 


s 3 


16 


Met 


L.ii 


17 


Ami 


1:A 


18 




! ! /> 


19 


iSev A hi Pro Area 


trace 


:o 


Met 


^ ; 


:i 


AN 


4 1 1 




T-. r 




_ > 


iThr: 


4A 


24 


Met 


A 1 ) 




Met 


u>a 



:o|r 



26 

28 
29 

50 



Uln 
Gin 
i i j 1 y i 
i Leu i 
Ad a » 



4a 



J Onh "5".. of each product generated b> the sequencer was ana- 
ivzed.Yields listed ti'nme were normalized u- Inn"., nnecuon 



(Nevins l^Sii. The ammo acid composition of the mature 
polvpeptide encoded bv the cDN.\ clone agrees with the 
ammo acid analvsis of the 10 kDa /em proteins (Table 1 ). 

It is interesting to note that the DNA sequence oi this 
clone differ- froni that predicted bv the protein sequence. 
At amino acid position 23. the cDNA clone encodes a cys- 
teine, while a threonine residue was identified in I lie \- 
termma! ammo acid sequence. Tin - discrepancy may repre- 
sent an aileiie difference, since the protein was isolated tn>m 
the inbred imc BSSS-v. while the cDv\ hbrarv *as pre- 
pared iVom poly A RNA from W22. Aiicniato civ. tin- 
residue might represent an additional sequence microhcter- 
ngeneitv which went undetected ias discussed earlier, the 
protein" was not denwili/ed prior to ammo acid sequence 
analysis, and cweinc could not be identified). With the 
exception of ammo acids that were onlv tentatively identi- 
fied, the remainder of the predicted ammo acid sequence 
agreed precisely with the N-tennina! ammo acid sequence. 



Devciopnii-nttj/ ev/'revw^ m at the 

It had been shown thai the level of 10 kDa /em protein 
was hmher in BSSS-N< >eeds than in W 23 seeds (Phillips 
and McClure pis?). To determine whether the differential 
accumulation of the in kDa /em protein m mature kernels 
of BSSS-v" and W23 was correlated with differential levels 
of 10 kDa /em KNA in the developing endosperm, we ana- 
lyzed R N A from the progeny of self-pollinated V\ 23 and 



30 50 

G GAAGCAA^ACACCACCGCCATGGCAGCCAACATGCriTG=ArT'C^C ? CT^C^A^ 

70 90 110 

CTTTGTGCAA^CGCCACTAGTGC'GaCCCATAT^CCAGGG "ACTTCCCACCAITCATGrcl. 
LeuOysAlaSerAi-TnrSerAlaThrHisIleProGXyHisLeuProProValHerPrc 

rrGGGTACC^TGAACCCATGCATGCAGTACTGCATGATCCAACAGGGG^C^^CTT- 

ATGGCGTCTCCGTCCCTGATGCTGCAGC^CTGTTGGCmACCGOT 
MetAlaCyBProSerLeuMetLauGlnGlnbeuLeiiAidi^uProi^iioin.n.Meti^ 

GTGATGATGCCACAGATGATGACGCCTAACATGATGTCACCATTGATGATGC^ 
ValMetMetProGlnMetMetThrProAsnMerMetiSerProLeuMerMetProSerMet: 

-» 3 0 : 5 C1 

ATGTCACCAATGCTOTGCCGAGCATGATGTCGCAAATGATGATGCCACAATGTCACTGC 

17fl 390 * 1 1U 

r ArrrCGT^CGCAGA^ATGCTGCAACAGCAGTTACCATTCATGTTCAACCCAATGGCC 

AspAlavIl^er^ 

,c r 470 

ATGACGATTCCACCCATGTTCTTACAGCAACCCTTTGT^ 
MerThrlleProProMetPhfcLeuGinGlnProPrieVaioxY-^iwaPrie 

490 

atatttgtgttgtaccgaat/iATgagttgacatgccaT'-gcg ^ er>jA ii: 

550 

A^AAAACAAGTTTCCTCTTATTATCTTTTT ( A; n 

Ki» 4 \ueleoude ^,|iieaee ane. dcr- ^- po 'ie::^e,: u :nee M'k,..-^ 
•fiK- ./n-.M-. indicates Lie \-:eHiii^a: ammo a,;a me mature no.;.- 
peptide, as determined s^r^mai ami^> ao: ..aacne.ne. fac 

the arrou encode- a Z'-ainmo aeu. -lena; 
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BSSS-53 piant.s. T-Mai 1L ^ L was prepared from endospe*-m 
tissue isolated at s time points nosi-polimation. The R.VA 
samples were compared by Northern bloi miabsis usinL 
the H» kDa zem probe described in Materials and methods. 
\s shown in Fit;, r. 10 kDa /em transcripts were first ae- 
tected at 12 dav^ piwt-pollination. The level of l«t kDa /em 
rr;mscnpt< reached a peak at 15-18 days post-pollinanon 
and declined after that point This pattern ot developmental 
expression is similar to the results obtained tor other /em 
-cues (Marks et ah 1^8-aL i.e.. zem transcripts were first 
observed at approximated 12 days post-poll, nation, the.r 
lc\eN peaked between IS and 21 days posi-pollmat ion 
declined dowh, after that time. As shown m Lie A 1" kioa 
/cm R\A level, were simn! icanib limber m liSS^^/ inar 
i:; YVZ3 at all time point > analyzed. 

Ir.Uifhih- <>! //?«' I" kPti zem vene < njn nnnihc>- 

\ pu^ihle mechanism lor the elevated 10 kDa /em R\A 
levels m seeds of BSSS-^ is through ampi i.ieaiion o: t;ie 
in kDa /em structural nencs. Therefore, we compared the 
genomic DN As of BSSS-.a^ and W23 b> Southern blot hy- 
bridization. Genomic DNA was isolated from seedling ol 
HSSS-^ \\2> and the cross YV2 > - BSSS-Nv The DNA 
.amples were analv/eci bv Southern blot hvhridi/aiion usmu 
the 10 kDa. /em probe (km hi. Comparison - n the mtensitv 
,\f tnbndi/ation of the probe to genomic DNA versus the 
uene-copv reconsiruction. indicated thai the Hi kDa /em 
^ e n^ : A; is present m onlv one or two copies m both \\ 23 
and BSSS-5? Thi- result demonstrated that there was no 
L .ros. amplification ol the 10 kDa /em genes m BSSS^Vv 
The results presented m fa : a h also demonstrate the ex- 
igence of restriction fragment length polymorphisms 
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refjn'jo :o as iL »'hd' s<appic red lie co ^iiiieiin 
Waii ar /cin-2 ■ Noo.e;, and V* d-vAn ' 1 ~ } •. The 

22 hC'a a lit! iu kl>a /em , ,\\\- encoded by a o>mpic\ muiti- 
^ene lamih watn a pvo! vA" active jliui mactue ecnes (rc- 
\iewed .p. Heidecker and Messinjj ! 1 ^a>i. [r- contrast, the 
lAhL'a and 2" kk;a /■:• are eacr- enco^eu s ; only one 
or :w-"' ee-^e^ 2\\ pit, ao-' _ari-.!: i: ,|J aJ - .pp. ; pp \ies^inc 

The anal;- si.-- -..M" eene --»p; p i/o ." e op^:^orteo b> iso- 
eleetnc rocnsine analyso, and '.wo-dinici^i' -nai eel electro- 
phoresis of/em PC' i y pep t dc-. ^ hoe "die /em - 1 polypeptides 



sjiow extenst\e ciia re'e ute^^oio' . ■ t-oenett; eta 1 
Haeen and R uheiisiein ! v, m>: Murk. rati 1 , e; ad ! ! j. it has 
Ivor reported Mi 0 1 the 2" kDa. : " /A' -mi 1 j..r^ /reins 
arc each I'eprescn ted b> polypeptide^ ot a spyele isoelectric 
point (Htirknian e* al. \ i a id, - c 
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Ae -t jrmiiuii 
r A /em ■ ubclass 



amm< ■ acid scouen/e sae;_eo 
eopppp. npp. 0 1 . ; ■ ■ pi p! t p' poi'-'pci''' idc- , ; ever, since 
positive colonic-. Acre ^al\ detected ao!- pr. dvj ' i. it 
unclear at this tppp v. he her or ip> the ei immune versa s 



net ntoapic a : ' CMC tie 
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Zein poh pcpl I'.lcs arc cha r.ic teri /ed Micir Inch con- 
tent of proline, eiuumimc. leucine and alanine "' nana/Ai 
et al. M>". Wilson l l )Sa) The .2" kb>a. !5 kTVi amd 10 k 
/em ^ are distmetiishco 1 from the 22 kl)a opd 1" k ! >a classes 
b\ their increased content ot" c\steme aiui methionine 
t(iiana//a el a!. [p-ci- et ai. i'^i; it lias been pro- 

posed 'fAuilis et al l^Ha), t j lat -pvese p. >b- pep: ides mteract 
through mtcrmolectikir oisulbole bonos. ^Incb. tcailts in 
their erTicient extraction onK under reducme conditions 
The K) kDa /em is remarkable lor 10 e\tremel\ imah methi- 
onine content i22.5"ni. With the exceptKMi 01" the I ^ kDa 
/em. uhere methionine constitutes appri>\imaieK KVb. ol 
the ammo acids (Marks et al. I^Sab; Petiersen et al. I9S6I. 
methpmme is a rare il".. 2".ii amino acid m oilier /ein 
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In the mai/c I erne;, /em polv peptides arc dumd: sevpic-- 
ier cd in incmbranc-bound granules /ailed proiem h, idicP 
-Wolft-'lal l l )(r, Die deposition oi '/cm po! peptides into 
protein bodies i -vn^a: a .v,"ir Ma coirans.ationa: imh- 
port inh' the iou^i endoplasmic rcnculinn G arkm- and 
Hurkmaii 0>~V liur: and Bun 1^!,. TIk U>kl>a /cm 
: pNA clone L'HOHiLN .[ pt.K pepude w inch is 21 amino acids 
longer ai the Ndermmus than the mature poiv peptide 1 he 
sequence of ihc \-tctnunal 21 amino acids diows linking 
homolouv to signal peptides of otlicr /cms (Messing 
1987) Therefore we believe that tin- sequence constitutes 
a signal peptide, am: :i t> hkc!> liiat the 10 kDa /em is 
deposited into piotcm bodies in ihc endosperm. 

The level of the b'kDa /cm protein was prevmuslv 
shown to be higher m seed, of BSSS-33 than m seeds ol 
VP^ (Phillips alio NicGiure B'ae). iGis difference uas cor- 
related wit:; d;Gcren: e el .dGo I Da /-in RNA in develop- 
ing endosperms from these two inbred lines :big. Ai 
alftime point-; anaiv/ed. Hi ".Da /em iranscnpts were more 
abundant in BSeSG:- a- compared to W 23. uhiie dieoveral! 
developmental p'-ofde .mpeared to be unaltered. Quantita- 
tive data indicate that IP kDa zem RNA levels are 2- -o 
5-fold higher m BSSS-53 than in W 23. depending on the 
developmental time pom'. (I. IGnhara and .!. Messing. :n 
preparation). The increased lo kDa /em RVA level:- may 
be due to increased ^inscription of the in kDa zem genetsi 
in BSSS-53. or pos>inl\ to a difference m .labilitv of lb kDa 
zein transcripts between the two inbred lines. Regaidless 
of the cause however, it is hkelv that the increased level 
of 10 kDa zem RN.-\ coi. tributes u the increased level oi 
10 kDa zein protein found m the mature seed. 

The increased expression of the 10 kDa /em in BSSS-53 
represents an. interesting example vf diflerential gene ex- 
pression. While mutations such as opaque-: iMisra emd- 
1972) ana *U>uv -3 ■ ''sdM-n e; ai. G>o5; Hai;:>ei eta.. 1° a) 
result in a decrease m /ein proteins m the seed, m B^So; 
seeds a siibclas- ot /em proteins is increased. In opmiui-2 
mutant-. 22 kDa /em mP NA and protein levels are drasti- 
cally reduced ■ vbsra e! a! 1 ' >G ; - : Soav c et al. 1^: Petersen 
etah P'S". Bm" and Bur: 0>-G). r he .v>a.///a-2 mutation 
is located <m niai/. ch r. am. >-ome G an!un ed to some ot 
the zem eenc ■ w la eapr ;s-,n »n it afleci - > v m\ c et a:. P 1 m 
The < V aa;^G -a/ ■ dnauki t 1 ■ be a rcuidator, gene in- 
volved m /em ecne .vp'e^siop. Die genetic Gcmcm respon- 
sible lor o\ ere\prc-.i-n >d the M»U )a /em proiem is lo- 



cated or 



J i Kenner mid Phillii^ 1 v -»Sr-i. P.e- 



tentU. it has been dLkrmmed '.hat this eiemeni is not hnKcd 
to the Id kDa /em structural genets) in BSSS-rv (M. Benner 
and R. Phillip'-, peison.d oimnuinicitioni. Since tne ele- 
ment respoiiNible foi die . n ere vpr >m« >n is not linked to 
the structural eeneoo n ma\ represent a rjeulaiorv gene 
which enhance- the e\pr>-.ion .-.f the Mru/tara! gene-s). In 
contrast to ..pu.tiiL-:. which affect- the e\pre^ioti of a large 
lamilv of gene-, nmiecu.ar anai>M, of the .ncivvpr^mp 
of the 10 kDa /cm ma;, be simphnedi due to the >mail 
number ol in k I )a /em gene- 
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'Jnajc jhe primary ammo acid sequence or an abundant methiomne-rich 3 eec protein 
lS ' n * iholletia excelsa H.B.K.) has been elucidated by protein sequencing and from :ne nucleotide sequence of 
:DNA clones. The 9 kDa subunit of this protein was found tc contain " amino acids of which i- were 
■*av« methionine (lS (r o i and 6 were cysteine (8 ,r n ). Over half o r ' the methionine rescues in this suhuni: are clustered 
in two regions of the polypeptide where they are interspersed with arginme residues, in one of these regions, 
* n * methionine residues account for 5 out of 6 ammo acids anc four of these methionine residues are contiguous. 
The sequence data verifies that the Brazil nut su I fur-rich protein is synthesized as a precursor poly peptide 
that is considerably larger than either of the two sub units of cne mature protein. Three proteoiync processing 
, steps by which the encoded polypeptide is sequentially trimmed to the 9 kDa and 5 kDa subunit polypeptides 
have been correlated with the sequence information, in addition, we have found that the sulfur-rich protein 
from Brazil nut is homologous in its ammo acid sequence to small water-soiuble proteins found m two other 
oilseeds, castor bean [Ricinus communis) and rapeseed \Brassica napusi. When the amino acid sequences 
of these three proteins are aligned to maximize homology, the arrangement of cysteine residues is conserved. 
However, the two subunits of the Brazil nut protein contain over ; Q,r <> methionine whereas the homologous 
proteins from castor bean and rapeseed contain only 2.!% and 2.6 r n methionine, respectively. 



Introduction 

In contrast :o the seed proteins from many plants 
which contain relatively low levels of the sulfur- 
containing amino acids, the seed proteins -from 
Brazil nut iBenhoiiena excelsa H.B.K.) contain 
targe percentages of methionine and cysteine, 
3.3^0-9.1^0 by weight [3, 26]. From a 2S albumin 
fraction of Brazil nut proteins, we previously pun- 
ned an abundant sulfur-rich protein. This sulfur- 
ic h protein consists of two low molecular weight 
chunks, a 9 kDa polypeptide and a 3 kDa poly- 
peptide, which associate through disulfide bridges 



to form a 12 kDa protein molecule i unpunished 
data;. The >ulfur-rich protein is synthesized :r. :he 
>eed only at a particular developmental stage- 
about 8 to ^ months after flowering. In vum and 
in vivo labelling studies have indicated mat this 
protein ;s synthesized initially as a larger precursor 
polypeptide of about IS kDa which then undergoes 
three proteolytic processing steps before it attains 
its mature form [2]. 

We now report the ammo acid sequence of some 
of the large subunit of the sulfur-rich protein 
obtained by Edman degradation. Using a syntnetic 
okgodeoxynucieotide probe whose sequence was 
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partial amino ac:d sequence, we have :uem:ae^ 
cD\A done, encoding the sulfur-rid. n rot .; R 

th,s Paper ' we P'^ 11 ' complete nucWtae se- 
quence of one Brazil nut cDna clone ana veP'v 
£hat :he sultu -- r 'ch protein encoded b« th „ don- s 
synthesized a, a larger precursor polvpeptiue We 
have correlated the three processing .,teps m which 
ihe encoded polypeptide is sequentially rnnime-t 'o 
:he 9kDa and 3 kDa polypeptide, with he se- 
quence information and demonstrate that -he 
9 kDa subunir encoded by this done contain-, iw, 
methionine and 8% cysteine. Finally, a computer 
■>earcn of available protein seauence* reveal^ rha' 
the methionine-rich protein from Brazil nur 
homologous in its amino acid seauence to smai' 
uater-somoie seed proteins round m castor bean 
and rapeseed wnich contain only modes? ieveis 0 ' 
methionine. 



Materials and methods 

Plant material 

Brazil nuts are indigenous to the Amazon River ba- 
sin: they do not grow anywhere in the United 
States. Brazil nut fruits were obtained approximate- 
ly 9 months after flowering from Brazil ,\Ianaus> 
or Peru Uquitos or Puerto Maidonaido,. 



Purification or the mifur-nch protein and ammo 
acta sequence determination 

Brazil nut embr>os were ground into a fine n a <te 
and defatted by extraction with hexane TV -suit 
ing defatted Brazil nut flour was then extracted ,n 
a butler containing 1 \l NaCI ,n 0.0^ VI sodium 
pnosphate buffer. P H '.5. The sulfur-rich protein 
was purified from this crude extract by <he proce- 
dure of Youle and Huang [26]. The resultins su- 
crose gradient fractions were dialvzed extensively 
aga.nst deionized water at 4 ' C to precipitate the 
contaminating globulin proteins. The final protein 
sample contained polypeptides of 9 kDa and 

3 kDa when analyzed on SDS-20«o polvacrv.'amide 
aeis. 



--.iu^.,...i;_ >,a. prepare^ 
- -"--I c. . ,n- s o :r,c pa:-- sulfur-rica 

■ - ■ f 'r-. contain- 

' j ■ ..ln, - aiu. v. :-mercj.pi-->ethanol at 
-\ " u '» a *" il,fr ^-" 2* 'o- hour-.. A; the ena 
o: :h- iiK^ano!,. louoaceti: ac:c \-a, added to - 

iinai -^m^uor. 0! "-- and ihc .ample was 
mcurated at c , n ;h , j ar : ^ ;.,! miPules . Aft e - 

^ treatment. ;nc protein ,ar.^ , a y. a ,,, ed ex ' 

^cnsivei; again;- detomzec! water and -.oohilized 

Secuence anuj>si> -jf the ;u!!uf-:icr: -rotein w as 

oertormec *, automated Ecimar, degradation [61 

on a Bccicman sm uC ijeuic-Dhasc scauenator 

^ ou — ^ — r ■-i.vm-J ^ro-rram 0507S5 

■ - „ ...ucnLi, Jncj 

: ri nmo: 
:he !ia- 



caed to the 



fractions and used as an in:--:::.' ,:andard for 

quantitation of e.ch c; PhenyKhionydantoin- 

cimino acids wer^ ic^^^,>;^,; „ _ _ , 

acKarc 4;0| gas ii^id :nrc^a T - a^-v . r 20] (Wa 
«r 6000.V high performance iiaJic 'cnWmatoara- 
?h:.' and layer chromato.raph.. r ? : A: 
lW ° 0! :ftr - s<; methods Aere use- a: each -t-c A to- 
tal of iSO cycles or degradation -.^ conduct and 
9^-o repet-r : y e ;.-d w a , obser e_. >r ^TH-ammo 
acid could i -i r r= -- 



Preparation of cD\4 l.br^y a »c tsoiwion of 

clones 

Poiyadenvi^edKVA ,a- px-a-e. T „r: ,■ -month- 
old develo-n, :'.;..,! .., v „ le;ho(js 

was cloned ;n the Jimer-prmie- ^.RC" 
The resulting clones were screened bv colony 
hybridization ',2-11 A . )n , j : - i.-^d;.... , vhjch 
consisted of a mixture oi h o.nrhetic Mi-.-deoxy- 
nucleotides compiementar;. -c the ft pos,,H e RNA 
sequences which could encode a methionine-rich 
region found ;n the partial ammo ac:a scuuence of 
fhe 9 kDa subun " of -he >u!-'..-r,ch protein 
(F:g. IB). The probe ua^ hybr:aizea the rilte-s in 
6 < NET (0.9 \l NaCl. 0.09 \I Tr> C P H " < 
0.006 M EDTA). O.ufn, \P-4 () . and cm, u , mi 
yeast :R\A at VQ for 20 h„ur... The filters were 



241 



-°nrair t . 
la noi a , 

Ie d to a 
Pie was 
>■ After 
zed ex. 

; °n [6] 

Jnator 
^"0783 
- Inc.) 

nmoi 
■ie Iiq. 
to the 
•d for 
ntoin- 
*d bv 

(Wa- 
ogra- 

leas: 
A ro- 

and 
nino 



ith- 
ods 
md 

:n- 

my 

:ch 

■<y- 

>A 

:h 

of 

■:n 

:n 

- % 



C be: ore u up* radios ra- 



5*7«^"? oj cD\ J ' yrmer extensor 
jn aiysis 

The sequence of cD\a clone pHS-3 deter- 
mined from both DNA strand- by the dideov 
*nain termination me: hod [12] . Where necessar>. 
^ 2 ions of the done were also sequenced by the 
method of Maxam and Gilbert [13]. The 25 nucieo- 
.,jes at the 5' end of the mRNA encoding the 
sulfur-rich protein were noi represented in pHS-3 
but were obtained b> using a synthetic oiigodeoxy- 
nucieotide complementary to nucleotide- ~d^-o- 
as a primer to synthesize DNA complementary to 
the 5' end of the mRN'A and sequencing '.he result- 
ins extension product. For primer extension, the 
oiigodeoxynucieotide 5 AATCTTCGCCATGGT- 
GATTCT 3 ' . labelled ai its 5 end. was annealed to 
3 /ig of poiyi.Ar RNA from the seeds of 
9-month-old Brazil nuts in S mM Iris pH ".5. 
5 mM EDTA at 90 *C for 5 minutes. NaCI was ad- 
ded to OA M ana the sample was incubated for 20 
minutes at 90 "C relieved bv 15 minutes at 25 -"C. 
The annealed DNA sample 'was brought to a tinai 
concentration of :■<) mM Tns pH S.3. 5 rnM DTT. 
15 mM MgCl : , 0.5 mM d\7?s. and 0.1 ml 
3SA. AMY reverse transcriptase < BRL. 3". 5 units) 
vvas added and tne reaction was incubated at 3 L 
:or 90 minutes. EDTA was added to 20 mM and the 
".ample was extracted twice with phenohchloro- 
form:isoamyl alcohol (25:24: i) and precipitated 
-vith ethanoi. -Alter denaturatiom the samples were 
■ubjected to electrophoresis on an ^' r n >equencmg 
: r eL Three bands resulted which differed in length 
iy single nucieo r ides. DNA from each o> the three 
")ands was eiuted from the gel and sequenced by the 
method of Maxam and Gilbert [13]. 



Hybrid-selected translation o f cDSA clones 

Characterization of cDNA clones bv hybrid- 
elected translation was performed as described bv 
Maniatis [12]. Three micrograms of either pHS-3 or 



n^Kd." niasmiL. w ueic denatured, oounu to 
nitrocellulose paper and m pndized to 2 :r r o; 
Poi\«A:" R.vA prepares rrom ^-montn-old Brazi! 
nut seeas. RN.~ which v. a; -;p e;G : cai' ■ b^und to 
the D^i was then eluie-d. precipitated wun emano< 
alone with 5 ug ;arrier »eas* tRN-\. and translated 
m a wheat germ ^sterr In aduinon. the 'run>la- 
tiot; products directed bv R\- selec.ed h; pHS-j 
were immunoprecipitatea i >:tr: poooonai an- 
tibod> which had been made a maxta^ o: 'b.^ 
9 :<;Da and 3 I.Da component of the mature 3razd 
nut suli'ur-rich protein and it, 12 -Da precursor. 
The proteins, labelled with — Sjmethionmc, were 
anaivzed on a SDS-2U' r n polyacryiamide gel and 
visualized b v auioradiourar-nv. 



Partial ammo acict secuencc o 
protein 



'h 



e '7L!0'nr- r icn 



Two ammo acid sequences were obtained rrom 
the analysis of the carboxymetnyiatec sulfur- 
rich protemt one major <80 ,r V and one minor 
(20^0). The major sequence starts with Prc-Arg- 
Ars-Glv- Met . . . as NH-termma: ammo acids, 
while the minor one starts with Gly-Me:... 
(Fiiz. 1 Ai. The two protein sequences are 



in the region sequenced except that the minor 
one is three amino acids shorter than the major 
one at the NH- terminus; thus we were aole to 
determine the first 5" amino acids for tne major 
sequence and the first 54 ammo acids for the minor 
one. The sulfur-rich protein consists of wvo 
subunits. a° kDa polypeptide and 3 kDa po- Pep- 
tide. Both the 54 and the 5" amino acid sequences 
exceed the length of the 3 kDa polypeptide, thus 
these sequences must represent l -> kDa polypep- 
tides, possibly two member of the 9 kDa polypep- 
tide family. We did not obtain any amino acid se- 
quence for the 3 kDa polypeptide, suggesting that 
either the 3 kDa polypeptide sequence is identical 
to the 9 kDa sequene or the NH : terminus of the 
3 kDa polypeptide is blocked (see Discussion). 

This amino acid sequence represents about ~ <r e 
of the 9 kDa subunit. The sequence contains un- 
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.j ; : -Pr -j-Arq-u i 'j -u 1 —Met - Anj - Arg -Met -Met -Artj- - - 1 • - u i U - J-;r>- e _ , _ .-,-,,i M 

- - JUH 

/■/i.'. i'A. The pania; amino .icid sequence or ihe ^ kDa ■iubum. of :he >ui:-jr-: ic:i protein ;r««m Brazi! ru;r .V:r' -jh ;i 0 n ma i^n 
methylauon. ;he punned iulr-jr-ncn protein wa> >cque:iceu usin.j ar auiomuik aouic-pnuse .equenator. Ta ., -cjuenc- -nr .n. 
fS0%t w,[h Pro a, ;ne \H : terminal ammo acid ..newn m me firs- imeo, aiu: on c :ninor wun G:- -i-- \'H. r — . n i 

acid .shown in the >econd line), were Jetec:ed. Ammo acid -siciuc, found in rnc ™ nor <eci:ence «hio .r ■ , ja ": ro 'hi-" 
in ;he major sequence are ;ndicatea with Jasnes. \lem:unine-rich region, round ir ihe par:iui sequent ar_- hi-n ii^ntec. ' 
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/5 - Ammo acid ^uence of :he first methionine-rich region which was used as a -a;:: for- < ^nrhe- ^ ..i:,-,^,,- - n .. .w„. d ., „ rnh „ 
The sequence, or ;he 6 possible mRNA; encoding :his ponion of :he protein sequence are shown :n ^e^econd 'mV anc^ 
mcJuaed ;n [he <vntheu.; oii-odeoxvnucieoude prooe comniementar;, :o :he mRNA are >huw- ; n -orro^ 'in,- 



usually high levels of the sulfur amino acids: 21 ,r o 
methionine and " fr n cysteine. There are two regions 
m the partial amino acid sequence where methio- 
nine residues are clustered with arginine residues: 
re*id ues ^29 - 35 (,-4 ry - Met - Met - Met -A ra - 
Met -Met) and residues ?47-52 {Met -An* -Ar?- 
Met- Met -Ar<>) (Fig. !A): 

Identification and characterization of cD\A 
clones encoding the sulfur-rich protein 

An oligodeo.xynucleotide probe was synthesized 
(by Biosearch, Inc.) which was complementary to 
the 6 possible RNA sequences encoding one of 
these merhionine-rich regions (amino acid residues 



*30- 35 ) (Fig. iBi. This oligodeoxynucleotidc 
probe hybridized to a number of clones from a 
cDNA library prepared using RNA tro-m 9-month- 
old Brazil nut seeds. Twelve of 'hese clones wrJ. in- 
serts ranging trom 350 bp :o "0*? Hp w^ere seiectec 
for further analysis. 

Sequence analysis of one of these clones. pHS-3. 
demonstrates unequivocal;} tnai :his cD\A en- 
codes a polypeptide which is extremely rich in the 
sulfur-containing amino acids (Fig. 2Aj. The se- 
quence of pHS-3 is 599 nucleotides long excluding 
the poly(A) tail. By primer extension analysis using 
a 21 base synthetic oligodeo.xynucleotide com- 
plementary to a region near [he 5 end of pHS-3. 
we determined that this cDNA clone falls 25 
nucleotides short of the 5' end of :he mRNA en- 
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The se- 
jiuding 
s using 
com- 
?HS-3, 
lIIs 25 
.'A en- 



:ides ^-6 -o n2-i js -Aeii as the poiw -\ i trac: vvas determined ;rom r ",e analvsis ^! the ;DN A jione pHS-.v The : ;r^: 2; ^aie^ -.vere obtained 
sequencing: r he larcest primer exiension product ^ hich wa^ <vnihes;.ted usmi: a >vnthe f ]c ^!it:odeo\\ -nucleotide -om^iementar-. t 



r esidues ^4Q-69 as a primer. The Mr^t :nree nucleotides were uncertain from sequencing and are representee hv \\\ in the seuuence. 
The first ATG codon •>om the 5 end 'nucleotides -5*"- 59) and the termination codon TGA i nucleotides rf-iQ5-4Q"; are marked in 
^es. The amino acid sequence deduced from ;he nucleotide sequence ot the resulting open readme frame is shown :n me second 'ine 
and the major 5" residue partial ammo acid seauence which was determined :rom anai\s ls ot tne purified suitur-ncn protein is snown 
:n 'ine 3. The approximate sites of cleavages \*nich ma> ^e invoked in the maturation of the >u!;u:-r:ch protein are marked -Mth arrows 
above the nucleotide sequence. ATG codons m the DN A sequence as wei! as methionine residues in 'he protein -euuenee are hichiiehted. 
^umbers m tne riLzht marein^ refer :c the number of nucleotides or amino acids. 
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coding the sulfur-rich protein. There are no .ATG 
codons in [he sequence of the primer extension 
product; thus, the first ATG codon found in 
(residues -58 — 60 » represents the initiation couon 
for protein synthesis. This ATG fits with :he con- 
sensus sequence for eucaryotic protein initiation 
sites [10]. A .top codon. TGA, is encoded b\ 
nucleotides *495-49~ :n pKS-3. The resultme 
open reading frame could encode a polypeptide of 
146 ammo acids, of which over 20% are >ulfur- 
conrammg ammo acids; !:.!% of these residue'^ are 
methionine while 5.5% are^ cysteine. The first por- 
tion of the encoded polypeptide contains j laree 
proportion of hydrophobic residues; of the 21 
residues at the ammo terminus of the protein. 36% 
are alanine and 18% are leucine. In comparison, 
the rest of the polypeptide is rich in arginme. giuta- 
mme and glutamic acid, a composition whfch is 
:haractenstic of other oiant seed storage proteins 
A hydropathy plot (Fig. IB) demonsirate/that the 
amino terminus or the polypeptide is hydrophobic 



while the remainde r of the oc^-prptioe is larger, 
iiydrophiiic. 

B\ aligning die ammo ace cc ;:en_e dec\e^ 
from 'he nucleotide sequence v. nh die major se- 
quence determined from :he pucfiec 9 kDa 
subunit. vve na^e found that rhv -odine region for 
the 9 kDa polypeptide beeno CG nucleotides from 
the 5 end of 'he mRN.-^ B; acdin'j p- ; he molecu- 
lar weights o: die indiv'iciua." anpnc accy. ^ncod^il 
by this region, we - at a ^a-c ->f :i:pio.r u kDa. 
Trie ammo ac:d sequence derp-.ec o^m the nucleo- 
tide sequence of the por-icn o- ;ne open readme 
fVame between Pajieotiue^ : ; po ..g-ces quite 
^ell, although no r precisely, with the maior 5^ resi- 
due partial ammo acid sequence of the ^ kD>\ 
subunit of the sulfur-rich pro r ein fFie. 2A). 
Methionine residues are very predominant in the 
9 kDa sUDunit of tne mature protein. There are 14 
methionine residues in this region, representing 
1SG% of the ammo acid polypeptide. Eight 
or these 14 methionines are found clustered with 
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am ine residue, in two regions ot the polypeptide 
j fg [he first cluster, between ammo acid residue ^ 
* d ]04. fi\c out of six residues arc methionine^ 
tour of che methionine residues are conngu- 
*us The second methionine busier, between ammo 
acid residue- -11* and 121. includes :hree methio- 
nine residue- and three argmme residues, interest- 
iv of the 4 amino acid differences which are 
; 0 und between the ammo acid sequence deter- 
mined from the protein and that derived from the 
nucleotide sequence are round in rhe methionine- 
"rich region that was used as a basis for the synthetic 
oliaodeoxynucleotide probe. A second cDNA clone 
selected by the same probe was periecti} Homolo- 
gous with one of the sequence represented in rhe 
probe (unpublished data), suggesting that the 
sulfur-rich protein is encoded by a family of genes 
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ion in these methionme-ricn 



resions. The 9 kDa subumt of the Brazil nut pro- 
tein also contains a high proportion of cysteine 
(7.7°7o). 

By hybrid-selected translation, we have tound 
that'pHS-3 is able to select a mRNA from a popu- 
lation of 9-month-old Brazil nut RNAs which 
directs the synthesis of an 18 kDa polypeptide in 
vitro (Fis. 3). This IS kDa polypeptide is im- 
munoprecipitable with a polyclonal antibody raised 
in rabbits against the purified Brazil nut suifur-nch 
protein, demonstrating conclusively that the sulrur- 
nch protein is synthesized initially as a larger 
precursor polypeptide. 

Homology <>f the suifur-nch protein ic otlie- wa- 
ter-soluble seea proteins 

In a computer search of protein- whose ammo add 
sequences have been determined, we tound ;har„!he 
suifur-nch protein from Brazil nut shares a great 
deal of homology with both the iarge and the >mall 
subumts of a low moiecuiar weight and water-solu- 
ble seed storage protein from castor bean [Ricinus 
communis) [23]. We have aligned the ammo acid 
sequence of the smail subumt of this castor oean 
Protein with the Brazil nut sequence starting at 
a mino acid residue -35 and that of the large 
subumt of the castor bean protein with ammo acid 
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residues starting at <r: ;Fig. 4A). Allowing 1 ^mali 
gaps in the small subumt comparison and - smail 
eaps in the lame subumt comparison to maximize 
sequence homology, we find over -U'r 0 nomoiogy 
between the castor bean protein and rhe 3razi! nui 
valfur-rich proiem. Both pro-ms are high ^ ma ~ 
mine, dutamic acid and arginine (22 ,r <> anc ::'•» to r 
the Brazil nut protein ana 29 5"* and U:.:^ for the 
castor bean protein, respectively'!, and the positions 
of many of these residues are conserve m the two 
proteins. Interestingly, both the Brazil nut and the 
castor bean proteins are relative!) rich in cysteine 
r tr o and 8.4< r o. respectively) and the positions of 
these residues are similar in both proteins. Another 
small waier-solubie protein found in rapeseed 
iBrassicu napus). napin [5], .hows some homology 
(about 2l ir u> with the Brazil nut protein (Fig. 4A). 
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Although the homoiog\ nored between Braz: 1 ^tr 
and rapeseed proteins is substantially ie.ss than that 
between the Brazil nut and castor bean proteins. 2 a 
out of 28 amino acids (85. "^o) conserved between 
the Brazil nut and rapeseed proteins are also com- 
mon to the ;astor bean protein. In addition, the po- 
sitions of cysteine residues in ail three proiem, are 
conserved. However, the Brazil nut protein is un- 
usually rich in methionine MWj while the casror 
bean and rapeseed proteini contain only aboir 2% 
methionine. Thus, a large percentage of the non- 
homology between the Brazil nut protein ana the 
castor bean or rapeseed protein sequences is due to 
differences in their methionine contents. We have 
also compared the protein sequence of the Brazil 
nut sulfur-rich protein to that of the 15 kDa niLih 
sulfur zem protein from maize [IS] which contains 
about ll^o methionine and have found no signifi- 
cant homology between these two proteins. 



Discussion 

Tne majority oi known proteins. o r ' both plant anc 
animal origin, have r-iatheiy :-v^ ic-e^ of methio- 
nine, usual!;.' around i -2' r <- a^ predicted ov - he the 



''>ry or moiecuiar evoiuru n 



n *he orese:^ 



siuj>. we ha\e partiaiiy sequence-::: an abundaiv 
protein rrom Brazil nuts wnicli :■ e:.cc^ti 0 nalb rich 
m methionine (is^o and nave identified and se- 
quenced a :DNA .lone encodinc mis proiein. Onf- 
one plan; proiein ^;th compararcle levels of methio- 
nine has been reported in the literature. Phillips and 
McClure recentlv described the ammo acid compo- 
sition of a polypeptide of" 10 kDa in a maize mu- 
tant. BSSS-53, containins 21 moi% methinnme 
f!9I. 

The sequence data from the Brazil nut cDNA 
clone as well as the data from the hybrid-selected 
translation experiment are consistent with previous 
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vlir0 tratiilutioii whicii nave -»u^m= tnat 

L sulfur-rich protein svnthesi/eu a> a tar^e: 
^cursor por.pept.de. The size oi the poiv r , T! .ae 
'coded b\ pHS-3 would be about i" kD:i. vwucn 
tdose to i he IS kDa value for the precursor ob- 
tained from the sizing on polvacry lamidc gels of 
! he polvpep!ide> translated from Brazil nui RSA m 
.,iro [2]. The correlation of the ammo acid sc- 
l u ence~obtained troin tne purified 1 kDa subunh 
w.th the !a>i amine acid, of the sequence de- 
rived from 'he nucleotide sequence indicate, that 
! he processing steps which are involved in the matu- 
ration of the sulfur-rich protein must be taking 
oiace at the ammo terminal end of the precursor. 
Previous //; nvc- labelling studie- dsmon>t rated 
mat there are 3 distinct processing <ien-. First, a 
small peptide, most likely a signal sequence, is 
cleaved from the 18 kDa precursor to generate a 
15 kDa polypeptide which subsequent^ is 
processed to a 12 kDa polypeptide and then to the 
9 kDa ana ;• kDa subunits [2]. We have not deter- 
mined experimentally the precise residues which are 
deaved upon maturation of the sulfur-rich protein. 
Nonetheless, we :an propose approximate cieavage 
sites that would divide the ammo acid sequence 
into four domains corresponding to the ofservec 
polvpeptiaes (Fig. 2A>. The hydrophobic nature ot 
the amino terminus of the encoded polypeptide 
i Fig. 2B) suggests that this region serves as a signal 
peptide. The alanine and phenylalanine residues at 
positions =22 and 2? would represent a possible 
deavaae site for a signal peptidase as determined by 
the (-5-;-, ruie of Von Heiine [251. * «cona 
cleavage niav take place around amino acid residue 
#46 and wjuld -esul'. in a polypeptide -M about 
12 kDa. W. have attempted to determine me exact 
location ot this Jea-.age site by sequencing the 
12 kDa precursor polypeptide, but found that rts 
NH- terminus : . Hacked. c ina!'y 'he major ,so<M 
partial amino acid sequence of tne 0 kDa >ubunn 
would predict that the cleavage site for tne third 
processing step is between methionine residue *69 
and proline residue --"0. whereas the minor se- 
quence i20 ,r o i wouid indicate that the nna: process- 
ing site is three amino acids away, between residues 
-"2 and The ? kDa region clipped off in this fi- 
nal processing step is extreme!', rich in methionine 
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pr-ur^o-- accumulate and aoe^. to rhe ? i.Du 
>uounu oi die adm - i-- 
studies crmc ^Simeomumu uidrcat- that 
^ kDa ^tbunn :> ncn in meinummc »udL, mo 
<nowm. in addition, mc amim- ac:d o.M:inosit:,P 
or me subur-H^ protein .upr-a. dn> noiiou. 
Tyrosine and tnreonmc r-iuu-- a:. ; o^^ -i- 
arnmo a^iL. ana! ^ kM .n^ r . u.i. 
tein Cv kDa - ? -Da- ;aa;a nc% *hown:. The<c 
residue arc not tv>unc ir. tne ammo ac:d 



derived M-om aic nuci-otidc sequence :m- 0 --^ 

- U " ; *~V. ra.n ( >n • [TV 

subum: dui are present m i.ve - 
media:eh pr^:C:^ :he 9 ;;D; :abum:. 

"Hte bomoio^ be:^een :hc .-.i:ifur-rie:: ^roie:n 
from Brazil nui and ^eec! -roten^ :ron; : ja^io- -ear 
and -aoeseed is rarncdiari;. s:ru.i:i r -w^, :nc .r.re- 
Dian:: ar: act cioseh reiaier :a:,ononn,ai: anc 
cas;oi- bean and rapeseed -viei:- domain iov 
eis oi meihionine. The proieins trom an ;n:ee 
Dlani- consul a. - ^mai. a^u _ ^ ^ - 

peptide and comam biah levels or' :ys;.eme. . ae o<> 
suions or these jvsieine residue, are conserved, suc- 
^esiing :ha^. rdie urucTura; frameworKf o: 
:hree oraieins may oe auite sirai'ar desplie 
Tiaae diiference: in chrir rr.ethiomne corneals. : ai^ 
structural simiiarit; mav be conserved in the small 
vvater-olubie proteins m other oilseeds -v. ar-e^c 

rjhvlonenetic reiarionship: as wcii. In - - :r ' " 1 

' " • ■ .s v - ; -. .ees: ^^o T e:n -. 

;nc amnio a.iu .m..^ . - -~ - 

Youle and Huana 1 26j noiea liiat the levels oi c;- 
te:ne in protein^ from different oil^eed^ i ,unMov^er. 
mustard, linseeu, lupin, cucumrer. Braoo nu;. 
hazelnut, yucca, ca^or bean, and corion. were 
ouite hiah and :n lac; very similar. Because - tneir 
burr amide comems. abundiav.v ;r -ec . and dis- 
appearance rrorr. -e-ds durin- cerno:.-0 n. tne- 
\o\\ molecular weianr proreins ^uceeded n- 

tuncstion as seed siorage protein^ with :he adddion- 
al and unique 'oie o\ proMum-a -.aiOo e- w 

termination [26|. Or thoe proteins, no^ever. onr. 
the Brazil nut 2S protein contains unusually h:cn 
le^.eis of methionine, contra^ to theoretical predic- 
non , ba^ed on the theorv of molecular evolution 
[16j. At the present time, we do not kno* wm Bra- 
zil nuts mieht require ^uch hign levels ot methio- 
nine. The soil ;n the Amazon region o rattier poor 
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Jii v^Uur [2]}- pusMPi;. tne.c ieveis of methionine 
aie requires m order ro proviuc an adecuiate mippo 
of methionine to [he germinating ^ech. \\njte\er 
the function or :nc Brazil nut >ui i ur-r: ch prouin, it 
appears that the structural irameuori; u f one 2S 
seed proteins is flexible enough to accommodate 
large number, of methionine residues wnhe 
preserving the small size, water soiuMiitv and men 
amjde content or these proteinv 

Both the castor bean protein and [he razeed 
protein are analogous to the Brazil nut <uiiur-nch 
seed protein in that they are composed of tuo iou 
molecular weight subumts. In the case of castor 
bean, the large subumt of the protein is Homolo- 
gous with the 9 kDa subumt portion of :hc Brazil 
nut precursor polypeptide uhiic the small <ubunit 
of the castor bean protein appear to co- e o--o :o 
the region of the Brazil nut protein whicn , ve o, e . 
Heve encodes the 3 kDa subunit. Interesting, one 
junction between the large and small subunits of 
the castor bean protein corresponds to the minor 
cleavage sue of the Brazil nut 12 kDa precursor 
(amino acid residue ^"2) (Fig. 4B). These data sus- 
gest that both subumts of the castor bean protein 



ma*> n.r ; % 'nrlievize'J 



n ' 1 r- * 



; -''V-" crecurso, 
similar to the Brazil nut >ultu;-ncn protein and tha : 

:hc ;!na; ^o^ine m-.oi-.eo u: the mat uratiJr 
or :hc jasior bean protein ma'- be similar to that 
tonne :t:< ;,ic iirazi: run protein. 

The processing mvooec in the maturation of the 
rupeseec" pr-ien: ;5j abo pear, surnames to that 
or' [he Brazil nu; sulfur-rich protein. A, with the 
Brazh nut nroiem. the large subumt of nanm :s 
found a: the carboy! terminal portion of the 
precursor ,r:g. a B > in botn Brazil nut and 
rape.eed. :ne precursor polypeptide undergoes ex- 
renshc jessing before reaenmg its ~matur- 
' UDunus - ^ best ...coroen: , : me amino 

acic seauence o: tne 'a r °e omv^- ^r-^. • 

protein wito :ha: or - he Brazil nut protein sequence. 
^ uppe...; .r t a, oi- jieaMige air .arge subunit 

Oi napm occurs at aoou L the err- pom 1 ac ;he pri- 
mary cleavage site of' the Brazil nu: large subuni: 
(amino acid residue (Fie. -IB;. 

In the past. t nere h e ert mac:: effort to en- 
hance the sulfur ammo acid content 0i seeds, par- 
ticularly tnose from legumes, by conventional plant 



breeding approaches. ~he 



ai: .mprovement in 
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. nUtr ition.u uualii> of the.e seeo. ha> not been 

; .; iflC ant n. ^ *** jpp, °^ 

' sS ful in obtaining higii lysine corn a-. --!• 
St ol sceu prote.n, in oilseed; have >hewn tna, 
" s a wKie occurrence ot abundant 2. protein, 
Averse plant specie, The. proteins apnea, to 
ji similar., in their structural rramework unu 
Cursor preceding, seem to serve a storage tunc- 

S and have a seemingly flexible am.no acid 
tiosiiion. The fact that a large amount o. 
Sonine i> .ocalized in , single 2S protein s P e- 
1 in Brazil nut suggests to us a molecular ap- 
proach for improving the nutritional quality ot 

eed proteins deficient in the sulfur amino acia, 
The clonic of a cDNA ending this sulfur-ncn 

, i.-..-- t •* tl o r : to ai- 

protein thu, represents a ,.r>. ... -R ... ■ _ 

er the am.no acid composu.cn of seeu proteins. . - 
further understanding of the genes which encooe 
th is unusual sulfur-rich protein should provide ad- 
ditional useful information. 
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iHsiitut fur Humjnucrich- 
Suhmirteu April -i. 



A clone containing the complete gene or the enbryo-specif ; - 

Tuciferin, has been isiiater/j rrorr. a Brass~c= 
The cloned gene, cruA . nas three~exo^ 



The nucleotide sequenc? 



storage protein 

napus library (EMBL3) . 

and includes 5* and 3" flanking regions 

of the coding region of cruA is identical :c the c DMA used 
screen the library (1). A TATA box, transcription start site 
translation start, and polyadenylat ion signals are indicated, a< 

~ ' tne TATA bo; 



10 



well as four regions 5 l.u 
the promoter of napin r another 
napus ( 2 ) . 



e nb r y o - s p e c i f 



TATTTAO;7TCGGTr^TAACQGACGGGm 



fici'/e homology 
gene of Brass ic; 



XiAGTjy GACCC-TTAA7CCC77A7AC7AQj. .^JC 



ATCTMTCAAAACMGTCTAGATCAMTTTGCAATCTTATTGCATATTTTTTTGTCTAACAATA ^ 
TGWGGAGCAGTTCGTTTCAMCGTAATTGCTATAGTGATGrTArTGTAAA 

TATACAAGAM7ATATAT777G7CCTA7TACATATCjCCTA7C7CAAAGTTG^ ,y 

GTTTTCTTACAATGTGMGCCAMTTAAAHTTC^GAAGAAGAaTA^ 

TATrXTCAAAAAAATTTATI^OUCTA'Ar^T^^ 

TWTCTCTTCTCTaTTTTrrTTAG^CTTTTGACTnTCTCCATGGCTCiACAX^GM^ 

AAGGCTCyVGGCTGGTCG£^TCGAGu T GTGGGACCACCACoC' , ' GC'CAGCTACG T T , 3C' r CTGGTG T 'CTCCT' r TG T AC. 3"* ' A:^A * CA T ZGAG Z 7AAGGG 7 C T C7AC77GCCC7C777C77' . . 
AGCACOX21AGGC:C7 CC7 T CGT GGC7 AAAGg t acq t ga^ 
aagt at a t cac ta tacacgtgctaaggt : : : ga t 

ttggtttgUacattata^iAGAAGG7C7TA7GGGGAGAGi&G7CCrGTGCGCCGAGAM 
CMGGAWCAAffiTCAGCGCCAAGGCCACCA^ 

GACCACATAAGGAaO22»CACCATaETACACATCC0QGTG^ 
CACCTCGACaXAACCCAAGGgtatataaata^ 

atgaccaaaatcatacttttctaagtTtarcctttg 

acc t taaaccc :aaaccccaaaccc t aaaccc t aaaccctaaat cc taaacc cicagcc : r aaac tc t aaacccra^cc:3agtt- 3 ta:^^na3t saaacattaagrgctartt:;. 
tgactttgar^ttggtgctagtttgagaacat^ 

atcrgtacCCAT7TTAC77AGCCGGAAACMCIXA(CMGGCCAAG7ATGGA7AGM 

CTTTWGATCGA7GTTAGGACAGra>ACA^^ 

ECAGGAGGAAGTTAACGG7TTAGAAGACAC»rATGCAGCCC&^G 

ACWCTA7GATCTrXCCATXTTaE7TCCT7a^ 

CAGACI^GGAAGCGt^TGTCiWGG7GG7TAACX^CAACGG7G^ 

CAAGCGAACAGT 7CCTX7 G« T a^G7T fCAA&ACAMCKA^ 

AAA7C7C£CTCGAAGAAGCMGAAGGG7TAAG7TCAACACWT 



AG7GAACC7aAfiTG7AAMGGAAG7 T AAA7AG7MTAAAW 



Tlf.-. O 



CTT77CTATfJWGGrT7AGCTAGCTfX^AATC7A7CAAC T TCAT7T77CGAC T ACGrc7Ar^rjT 



G AATAAA A7AAC" .300- 

5": 
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81 mer prepare- 
sequence of the 
3 independent c 
containing the 
protein 2. Rat 
The insert oC t 
region of 155 t 
position 468. T 
aminoacids of r 
deduced aminoac 
There is a hoinc 
polypeptides of 
aminoacid segue 
aminoacids in p 

-50 -20 
C&A r&AG T &0 T &GGAA&GC 7 C 7 GC - 



70 



SC 



CuG CCT C*> *57 CAC *C: * 
Arg Pro Gin S«r His 7n: - 



160 



!70 



ccc agc cct &&c ccg :cg 

Pro S«r Pro Giy Pro Pr- 



?S0 



261 



AA6 A AC AGG A AG ACC ' * 
O/s Ain Arg Lys Trtr ^ 



O&A C&A AGJ T AC AAG r 3A 
Gly Ar^ Arg Tyr .y* •*■ 
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Sequence and expression of a gene encoding 
an albumin storage protein in sunflower 



i Bacillus 
Bacteriol 



H D Allen*, K. \. Cohen A R.A. \ onder Haar. ( .A Adams, I). P. Ma, -L. V-ssie: 

Bioloiry Departing! a Ic.vo \ A \1 L m\ci np . < ici:^ Si nana T\ "^4" ; ' > * 



Thoina* 



Summary. The complete sequence ol a >uniloucr \Hciiun- 
ihus antntus) gene. i.A encudini: Li 2 S albumin storage 
protein was determined. The predicted unprocessed precur- 
sor has 295 arniiK acids, is rich m giutamme residues '24"..} 
and contains a hydrophobic ammo-mrmmus 'ha: n similar 
[o the consensu-; signal peptide. Amino acid sequencing ot 
the mature protein repealed eateiisp.e poo. -iransaiuonaf 
processing. Nuclease protection and primer extension anal- 
vsis indicated a major transcriptional start p'i nuc:eoudes 
> of the predicted ATG start codon. Additional sequence 
data, determined from a nearly full length, cD\A recombin- 
ant, indicate that HtiGf is a member of a small gene family 
comprised of a: least two divergent genes. Comparison o(' 
:he predicted //i/G5 gere product with sequences m other 
Known plant pi oaim revealed distant but significant ho- 
mology with the napir.s of Brussicu and oth.ei heteroge- 
neous seed proteins m the albumin superfamil>. 

Kev words: Sun flow er Albumin <_rene DNA sequence 



L L S S.. 



Introduction 

fhe structure and c\pression oi plant storage protein genes 
nave been itn e-a t ia \ ed m a number of m<m« co 
^lant species < ; -r iev. ed m K reo ct d. M) >^ 
,( ^>). In all cu-.cs d;>_ a., umuh: [ a a 1 o t a-'aac ; 
during seed dm. 'pm-cm. and matura' i« a. ivquirc 
-ii ghl\- regulated mprcsao!: «»! ac;am jnc->dimj dicsc 
dn ^ as such pr. ides an. excellent oppoiiump. lor 
''' die moiccuiar mechanisms controlling omoecmc gene 

Mia; I* ".\ er-. am p.; 



■e. dicn 1 
; e! al 

!\ u e: us 



the 



'M"t M CI U s 

i na h sis 



u at a. 



•r^-Miuii in p a a i l - o i ! a ,i » ci -■ a 
studies because iIil central di 
■ es eence consist- of hundreds ol individual bo\wr-> 
l!t ^'hieh produces a sinaie embr\o: consequent!} 
,u nllower piani can l oc;d gram quauddcs of de\ 
ai '> staged embr\ os. 

Sunflower emPr.os acmmunae 
; l °rage protein-. These are die H S giobuh: 
^ N"a("a and 2 S a i burning, soluble in \\a ; .c 



1 1 ; . o : 
intlo- 
each 
i single 
qmien- 



pa o ma] 



Ju>nC- Oi 

^dimie m 
Vi mic and 



:id - Aihcn>, ( , \ '(n>( r. t s \ 



it- 



1 L 



Huang ! s^- I r The ami <» er ' 
naied I'dianbunm i >cn^\ enke e: a. 
similar ;o- leeuinn'-'iK seed moiein- 



ra 



n 



■a e 



uhe: M . 



am. is rj-re-euteci n n^ne.a n\ a.r ay^nroximaLe. 



n.e\a m- 



) f- . I ■ 1 ' :■ , I I ' 

■ l : .km...:! i 



"a species 
aOt ■ !. L.>a 



// polvpeptide :2a 2~ uL.'a. miked b; distihioe 



■_:aiarro!iuo et a; 



n 



k. 



upumi 



\ ! ri mi a lar :e 



■^recarso:* n.ai 



are generated proteolytics 
peptide !;dieains !°>4j. Tl ( e ea.Mune and e\pre.s.si'">n 



iiantl-mm m:\.\As ha\e been aes^.r:p. 

The s\ntheas. pi'ocessmg and aecan - : aiatior: ] 
mm seed proteins ha\e been studied mtensb eh- m /; 
/;. •/!'/. ■ '■ r. Hicf e' a: b Fricsoi 
et ai. l^Nf-o. rad'sh Larv;:ie-F,a 
ear tor bean 'I -md l v) s'5 > and Brazil am 'Su: 



a 1; 
. I 



I .1 



,s< 



ad ! 



i.nii. 



' is 



ar 



A :naio=' concbisior; of these siuuie^ :nai the ena:" .cte:"' 
iow moieculai weignd di:-u::ldedmhed albumm nob.pen- 
m d e s lound m matme seeds resui: .V 'in the extjnave pr»">- 
cessing ol daraer precursor- synthesized o.urmg embnogene- 
>is Two additional l iiaraetensi ic- :hat deiiue *!ve - S a.ibu- 
min r-eed aorage proteins, arc ingh a made com en: md .high 
frequency ofcsieuu residue^ i Ymiie and Huang ' u H ' 

In sun:1ower. the Z S albumins represent nh"-re t ;u:n ^n'b, 
oi the pr-aem prescm m seeds (i aue am ! rbiaaj ! l 'Sl i 

peptides 
)' - hen 



and aoiis.'s! m; :wo oi pmee cio^cx reiatc 

with rmd'cula:" weigiit- 11 i appi - >':miaief' 1 
! 'J;a> \ d.-r ■! ! [ ) K ~ T "- ■■- 1 



! v nam a.neo u 



ullkh h-^id-- resit! :m:'.j m 



; ; ii -ma :• ^v. e:" . ■-■•rem. 
m na ' a ' : a . : U' • • ' N 1 1 d ra ] : ■■■ ; e 
n i d h an r 



an appar.mt moieculai v. eigh* d 1 J !-d *a wlup; sraP./ed 



b\ Si >S-; oi;. acr;. lanao. 



. ,.,e ' r. -pm i rcsi - ; m 



?y d 1 \- 

b ■ , oi' ■ 



and 



ansa'' n^ao'edLieum I'i-iVv -■ ^ ^ c ' re-.ii-.aa 

migiade^ as a 1 c > 1. Fda poh.pepiide it. .-hen I'^to. 
:rasi. n o ; .-iriver 2 b pi<a:m- arc ^anp^sed of !. 
small sLibumt p< m- peptioles ilcn\cd : roiv : a -angle p''cc\ir^or. 
an>! imhed bv uuenr.olecular ih adiidc forid-. » ( omen e: ah 
h'Sd [-.noar at ah d'Stv bur a' ad b^-~!. 

\iimmm p. >i\ pepi ivies ^ar dcuvted in Muda -e. ^: em- 
'•^r \ . b\ ; da - P' >s: -Ten a am! a am ! >IM a 2 na ■. - 'vi.mv lie- 



;andanm> are de-enable. 



T ' ,■ t 



ntmue : 



cununa! 



dmu;g:: s.;ed ma:uraaan Smd'oue: .dbuirm; :r.I ?N - \s. n>o 
drs; le'evied at ^ l.M'b. accumulate rapidb m stintiouer 
em'nrvos raacmne maximum pre\a.lence between 12 and 



Did 



\iicr ; ins 



rne a.ipumm trmi^cnps de^'ease m pre- 



\aieuce \\\\h hnetic . similar to th.ai obser\ed for helian- 
tianm mPNA (Alien et ai. h'N'r bunctiomd simtlower al- 



bumm mRW are undetectable in dr\ seed., germinated 
scedh neo of iui\c- i' < < h c m ! *'.m» i 

e describe Held die u'PipiclL- sequence o: a h\ pop o.w 
diifUii,. aerie, lijy.i:. dial encode . an uiduium ^a; dop;ee 



mm 'i f0\ I - i J 1 arm PC 1 



'p 



vjd p* »!' peptide P- __ l ' 



-in i . 



piotem Tiic predict o ! improve* 

m ' acuN u. ! JUL'! h and ct m Pi; m a I 1 ' dp x p h < ^ic amim Mer 
mis possibly represent: ng a signal .equencc lhai i> cleaved 
during po -ccssing \muio acid ^oa ueiici me o{ the mature 
protein indicates that fun her proteolytic processing occur 1 .. 
Tile predicted mature protein is gml amine and c\ demc rich. 
Hiii if p transcribed from a major transcriptional start 
.>(' nucleotides oi [he predicted ARi stan codou and 
contains e. single nitron with characteristic cucar\ one ? 
and 3 flanking consensus sequences Sequence data deter- 
mined from a nearly lull length cDNA recombinant suggest 
that //du5 represents a small, diycrgent gene iamilv with 
at lead ;uu members. h\iG:~ shares disian: bui stgniilcaui 
homologies with a protein supernimily that includes Bi\i>-~ 
sua napin 

Materials and methods 

PLmt nitt!tTia/>. Sunllower seeds < //. jnntta> L e\ . Giant 
Grey Stripe. Northrup King Seed Co.. Minneapolis. Minn? 
were obtained commercially. Plant 1 were held grown. Em- 
bryos were dissected from achenes at die indicated times, 
fro/en ir. liquid nitrogen, and stored at — Sij° C. 

LsoLition ana iahtlin'j; ol mi*, /Wo a.: p/.p Bacteriophage and 
plasm id DN As were prepared by standard methods i Man- 
latis et ah 1°8J). Total and poiyi \>~ RNA iron; leaves 
and staged sunflower embryos \va> prepared as described 
by Ailen et ak <l L kSM. Sunflower genomic DNA was pre- 
pared by grinding 10 DPE embryos in liquid nitrogen, fol- 
lowed b\ Ivsis in SDS and oriiank extraction. Sunllower 
DNA was iurther purified by banding twke on CsCI in 
the presence of 1 50 ug ml ethidium bromide. Radiolabeled 
hybridization probes for genomic library screening, phage 
recombinant mapping and genomic DNA blots were pre- 
pared by nick translating (Muniatis et al. b'~5) a O."^ kb 
/fcRl insert prepared from the cDNA recombinant //n'5 
(Allen et a;. l^KN Cohen I^Mm. Proves for nuclease protec- 
tion e\penme;its were prepared by labeling the dephnsphor- 
yiated termini ol a 1.1 kb /a <>\\ I fragment Iroir. //</Gm 
"4 P'. T, nucleonde kinase am. D- '' J P1 \ TP i Mamati- 



W itll 



.it ■ i : 

e i l L l 



P- j n2i. The labeled fragment was then digested with 
and a bp. as\ m metrically la deled fragment (posi- 



tion 



s,S 



o> m hi IP 



i was el phi ; t lev 



• 'if. ; c ' 



Sunllower genomic DNA < -™0kbi was partially digested 
with \Jh<>\: \Ihf>\ I ragmen ts uere i/e selected b\ sucrose 
density gradient ceni ni\igation The 10 20 kb si/e fraction 
was ligated into the l^im\\ \ site of [;MBI_. ; (Frishauf et al. 
P'n.m. packaged in wtrnand ampiiOed on (. t. ; S 2oOir tt ^ ( . 
v/v B. /Pt/A' \I ) (I.cLicti and Siahl P'^.M Tiie ampiitied 
E\1BI.^ sunflower genomic iibrary was screened lor albu- 
min piiage recombinants by h\ bridi/ation using nick-trans- 
lated probes i Benton and Da-, is l u "~c [alters were 
prehy bridi/ed for 4h and hynndi/ed for 1" ish at p~ (. ' 
m 4 ■ s[;l. s . [)eniiaidt. u. 2" .-, SDS. loo >U g m: denatured 
call tlviirap DNA. 5o ug m! pojyoA) and lo ae !ra poi\i(.'» 
' 1 • Nk I o D" M NaOi. 0 02 M Tn>, 0.002 M PDT-V 
p\\ s.o. 1 ■ LJetmardl solution - o.ol" ,, bovine serum albu- 



w ere w asneo >l. 
cont.dnine ' 1 ■ ! _ 
' I, eacn. at. 
onhi::P- ^ep 



i co a i : l'. a n i 



Ue '"CM 



.;d; '-J.ai'M. 



' m; ip^ < olidone. ) F'il ter; 
i T; - - and 1 x SET 
oer do .:.:»„ SDS f 0r 

; ' ;,MUV ^ recom*. 
mapped 



■ ta ! ' dj '"'. n "C ;:'l!r. 



i i i. ^ I . t! 



- . .;:\',a ■ ■ /p' f T '■ ^ -equeneed bv 

dicleo p. mick" tiKte cii nr. icrr.nii.M'M! m drio>f (Sangereta ; 
l^NOi .nter ligation int^ Ml ddpld and' \ i 1 3 m p 1 9 a nc 
iraiisi'eci ion mi.* J \ ! ! ' 1 ' i.Mcsung e: al. 1^N3). Single- 
stranded recom ninam puage D o '•. wa.. processed and se- 
diienced a - descrd^ed s Sanger e: a I ! W ; . \dditional over- 
lapping T4 |>td\merase ueietiom wi --eiecteu recombinant^ 
were prepared and sequenced a. described b\ Daleetai 
( 1 L 'S5; file complete sequence o.; conuguon:. sequences wa> 
assembled 0'"ivi diese overlapping cione :. p 'ompmer anai\- 
ses were -done on e. f^hc Micro-. a.\ uang the Universif. 
of W isconsU: 'Tienetics ' . omputer 'Group 'L'WGCGi Se- 
quence Anai; sis Sottv^aid s 'vct'su-ri 4i. 

Tt\;ns. ■:■/'/ L ,'/,\:/i s/o NLiclea^e niappn.g oiMhe transenp- 
tionai start site ot Hit 1 .:: v,p d e as uescnoed by Favalon^ 
c t a i . l-Ntii using a 2a0 bp /:.,'! /2 -\1 -'pigment asym- 
metrically labeled a: 'aie : : ternin ir - of the /pp'RI site. Tota. 
embryo- and leaf RS-As \vere u^ed. Tae only difference: 
m method were that the hybridizations were carried ou* 
for (v--d i) and 2 LimU ol m L.ng bean ;-.uciease were usee 
per reaction. Products were amdyped on poiyacryiamide- 
urea gels with pBR222 rfinf] labeled markers. 

Prnicin •:t;uncru\ Jfuit' s:::. f)ne grarr of :- ■ -aiiTovver seeds wa*. 
ground in 25 m.M Tn-H'Tk pH : .0. 10 m M NaCk 1.3 mM 
2-mer-;aptoethanoi and OA m'v J phenylmethylsulfonyi Huo- 
nde di - d : Ci. Solubiiized protein w as parsed over a ^ tr- 
DF All-celluiose column eauiiibrated m :nc same buffer 
Tv\ent\ micrograms ol protein tha. ;aiie-l n - bind to DEA^ 
under these conditions, representing primarily seed albu- 
mins, was collected and conccntrai ed by :y ophilization anc 
further re^oKed on id'P, SF'S-F'' \v j :p Tiie region of tn; 
gei cOiita.mme t he ma ior sun lio\u r albumin w as transterrea 
to an ammopropy i-den\ a t i/e-.l gkiss kilt."' \ebersokt et a. 
O'Nfp. and the first ! 2 ami in • -I ertm na' r/sii.iues of this pro- 
iem. w ere sequenced in the Tc\as \^:\ : '^.lopjchnology Sup- 
P' > rt !.. a b-ora t 1 aw 



Results 

Hiid^ ' i\t>iiiiit>n t har,u Ivi'iza ■ ton inui w./iVi/i. c 

Four sunllower genomic I)N \ recombinants encoding" 1 
humip seed, doraee proteins uere isolated' rom a bacterid 
phage /. kMBl ^ gen (Mine F»N \ library by hybndi^ t,or 
with an albumin cPN'A prohc. Hj^. Tup probe was p K 
yiousk. isolated from a bacteriophage /.et 1 1 ek)NA !if> raP 



usme ,\ screenme straieey Ivsed ^n die oi^ser\ation 



t!V 



alPumms are expressed b\ ^ DIM" m simflouer emhr u 
wlaie 1 1 S hehanthmms do not appear uno! DPr o^ l,v ^ 
e: al .os~ ( ,.ii L -n ]o\f,, lour : ndependem isolates ^ 
determined to be idlenticai ba-ed on comparisons ol res^ 
turn e:i/\mc patterns, and v oj^eLjuentiy one I:\1BL- 



comhtaanP Jesignaded //;/(i.S w ao- selected for anab-^ 
fagiire I -dino j restricdoi: map of the i: transenp lu ( 
unit. 
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The nucleotide sequence of HiiC-5 and the predicied 
ammo acid sequence are shown in Fig. 2. An open reading 
frame (ORF.i encoding a putative albumin storage protein 
begins with an ATG at position SSS ; this i )RF commues 
for 575 nucleotide^ where it is interrupted b> a i 9o nucleo- 
tide intervening sequence. Placement of the in iron was 
based on the discontinuity of the OFlF in this region, on 
the presence of excellent consensus splice junctions and. 
most importantly, on the cadineanty of the fhtG:- and fh:~- 
sequences on either side of the mtervemmc: seuuence. This 
single intron split.-, the AGG cod on for argmme at amino 
acid position l l >2. Following the intron. the GRF continues 
'or an additional 'MP nucleotides where it is terminated by 
a TO A stop cod' >n at pe»si tion P>M. A consensus pojvaden- 
}iation signal. A ATA A A. is located 23 nucleotides 3 oi' 
trie stop codon tit position P^>0. The combined length of 
exons 1 and 2 is. NJof nucleotide indicating a protein codma 
capacity of 2^5 amino acid residues. 



tran>c:ap"non sear; site- ueiin-cc! by the 2aO nt nucieaSL-resis- 
tani DNA fragment. Taken together, these result- suggest 
the transcriptional start site i>. located a; position n\ 
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are tnghlv lr. drophobic w;:h an average \v : or-. ;pi v;>b s>. 
— O.N45. It is likely but unpro\er : thai this hydrophobic 
domain is a signal sequence which facilitates transport of 
this proi.ein into protein bodies Thi> "leader" <ejuer:c-c 
is probai^h removed during subsequen 1 . povi-transatuo-iui 
events, i'smg the rules defined by on Heiine il^s^i. we 
predict that the most hkeh site i'or cleavage «<f :h:- putative 
sienal seuuence is alter die alanine at residue 20 ' >ee an" >w 
fdg. 4). 

Protein sequencing couilrmeu that Hot \? encodes a ;r,a- 
ior ->unilouer a'hunitn a ^raee pr-'icm and furlhe:' r:emon- 
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^ c n experiment (Fig. 3A} revealed a major mung bean mi- 
Le ase resistant I">\,-\ fragment 2e(i nucleotides (nit m 
^ n ^th. The same -i/ed tragment o resistant to s i nuclease 
-^B). A second putative nuclease -resistant fragment is 
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an ammopropy 1-denv a li/'ed glass illter r\eber-s(iid e! ak 
P'^j. The sequence of ihe first !2 residue-- 'oegmmne ed 
the mature N -terminus was detern lined m the Texas AWl 
Biotechnology Support Laboratory. Tins sequence, indi- 
cated by the box m Fig. 4. is , : perfect match wok die 
ammo acid sequence predicted dom // t /G5 and would he 
expected to occur on a random h;.^- a; a frequency less 
than Id : \ 

The predicted ammo acid composition of (he mature 
similower almimin compared with ttad ol it- precursor 
m "fable 1 A - expected from the ammo acid composition 
reported for sunflower 2S albumins iVouie and Huang 
PJN1). the mature suntlower protein is very glutamme rich 
i 25" ■) ) and also has relatively hmb levels of cvsteme idf" n ). 
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Fig. 3A, B. Nuclease protection ot /A/Cm transcription munition 
iitc. A Products of miing bean nucleate protection assa> separated 
^n6 (, ti sequencing eel. B Product > v S 1 nuclease protection as*a\ 
separated on 12"u sequencing gel Relatoe positions size 
markers on both eef are indicated b\ hh/h^t- t n ucleot :desi and 
'''''tvp The 230 nucleotide Mxiemcni piotccted bv embmo R\A is 
indicated bv £ /nvm> 
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a predicted a mini' acid sequence tor /AC 'Cohen I'^ol Gap- oere 
inserted to maximize die homomep between '.he two sepuences. 
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amino acid change, \ c/'.':. ./»•»•/ m indicates puum •• c coa'. a«ic sue 
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H 2. mdiea i e h< - m- •!< mics 'nth AY ,\ p a a pa i ' 'C"< mc i m i ! ! 1 r 2 ■. 
How J mcmericc ■ -• identical t< > die amn'n mcrirmia' -equence oi die 
mature st.mdm\e: aiCmpm i mc Result- ; 
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Arginine represents more than MtC, of the ammo acid r 
^ uei an ^ along with ill utaniaic (>.2"ni accounl> for t he 
Majority ol the char^cti residues oi die ma lure Lrene product 
°f HuQj>. The caiculatec pi of ihc predicted //</< o eere 
product is 11.5; ; here] ore. die protein should ha\e a net 
positive charge at neutral pH. The predicted nn>lecuiar 
height of the mature protein is PC k I )a and is m excellent 
a greement with mui estimate^ fr.»m S [)S-P i [: < ( when 
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H\ Was ISO ' al0 ^ K v h\bndi/tne a ^untlouer genomic 
: A library wuh an albumin cD\ \ probe, fhr i Alien 
aI - 198"; Cohen P'NM. Aitiiouiih // £ /5 does pot represent 

^ Cor np!ete albumin mR V\. it does sliare sequence homoio- 

^ A Hh Had 5 o\er the maiont\ of die transcription unit. 
0r nparison of restriction maps of /A/Co and /Ar" stnj- 

^ a these sequences were somewhat daxerucnt (data not 
°^ n *- The sequence dr.eriiencc between // t ;( ;5 and /A/5 
° r e precise!) illustrated m Filv 4 which shows a compar- 
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is a 4 amino acid sequence thai is present in fit.::' run t 
no! present in /7</< ir. An additional ^ amino acid cap t 
the //; •( - " ^eaaem 



M.''c»'fl.I kbo'. ohc:: !^ ! '^o \!ier at a: 1 °S 



position 



wi;i!i, s i O J 

ot each gap : he 2 amine- acid sequence- Miarv on-o-raid 
sequence conser\aiion indicating Uiar these sequence^ arc 
properly aligned. The additional -T amino acid sequence 
in i he predicted Inr polypeptide is comprised oi series 
oi direct repeats (indicated by arrows}, the longest sequence 
being 13 residues with the sequence OUSPY DR RO< JSP 1 / 
1 he signihcancc ot these direct repeals is no; clear bin 
they may represent recent duplications in the /A;5 eenc 
thai have not \ei diverged. These repetitive mooh arc not 
present in /A/G5 although a sequence which corresponds 
to the repeat segment <at is present m HaG5. The di>conu- 
nuities between the sequences of HjG: and h\o are not 
the result of cloning artifacts because in both cases multipLa 
independent isolates have been analyzed. 

The predicted gene products of I/nG5 and /A/: are 
identical at 14" of 205 residues compared r2".>i: this ho- 
mology increases to approximate!} 80'\, if comer\atne 
amino acid changes are considered as functional homolo- 
gies, e.g. glutamate to aspartate at position 105 Freuucnto 
these conservative changes separate relative!) large blocks 
ot perfectly conserved regions, eai. ! \ sine to arm nine at nos,- 
tion 133. thus substantially extending regions of functional 
homology. 

The marked divergence between the genomic DNA se- 
quence of HaG5 and the cDNA sequence of Hu: indicates 
that HaG5 and Hu5 are part of a gene family vu:h a mini- 
mum of two members. Hybridization of Ha5 with sunflower 
genomic DNA blots indicates that the albumin gene ;amil\ 
may contain as many as four members (data not shown 
These results are similar to [hose obtained for pea low mo- 
lecular weight albumins (Higgms et al. 1986) and Biassicj 
napin (Crouch et al. 19X3: Encson et al. 1986) and are con- 
sistent with observations on other classes of seed proteins, 
i.e. most major seed storage proteins are encoded b\ small 
gene families. 

Discussion 

By all criteria. //</G5 represents a typical eucarvouc. RN \ 
polymerase fl (Pol Hi transcription unit with the expected 
consensus sequence elements. Within the transcribed region 
these include the putative translation initiation sequence. 
A (..'A A [('A at positions s>°5 m Fne 2; tin- corre- 
sponds precisely with the translation initiation consensus 
sequence for mai/e /cm genes and differs onlv at the las- 
position (rom a consensus sequence from non-/ein. plain 
nuclear genes iHeidecker and Messing 1^X6>. The HaG> 
transcription unit contains a single R'O hp mtron (see 
Pig. 2). and the 5 and 3' exon mtron boundaries are consis- 
tent with the consensus splice junctions compiled for animal 
nuclear genes (Mount I9S2). A match with a plant consen- 
sus polyadenviation signal iHeidecker and Messing P-JXiS) 
A AT AAA. is located 23 nt 3 of the stop codon bemnninL* 
at position PH)() (Fig. 2 ,r There are no other precise matches 
with the consensus polyadenviation sequence, however, 
there is a series of live imperfect (AATAAi. overlapping 
sequences beginning at 20fO and an additional imperfect 
match at 228". 

Although the 3 terminus of the /A/Co transcription 
unit has not been mapped, the dimensions of the transcrip- 
tion unit are constrained bv the mature albumin mPA'A 
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euca'-y otic :'o! it transcription unir 
sequence a. posiiu-n .'fO. 26 nt fro-n 
die cap ate and a f -\.\T h.>mi'.og\ ! ( io m upstream cj 
th.- siart >ne. Among otner w el!-ci:arac !.l n/eoi genes, the* ^ 
con.ser-ed [.) s ..\ sequences, are i i"eq. uend; in plicated in 
eoniri» ; e-i transcriptional initiation nevicwed in Sertlin-i 
et al i'^N; a; ui^ether these or other sequences have a simiia- 
to;, m tae e :pressior: the (IliG ' transcription unit -, 



prjseupv un 



"" un estimation 



The predicted //ova:- eene procluca Jppi.: > many char- 
acensnc: vii [ ■sunm: seec! s-.orage pro-eni;. For example 
it p r:ch in c >teme ^nc ghiamme residue:- and also ha, 
a lelatP e'v high amount of other m trogei ■■-r:ci i amino acid. 
MiLi' a^ argmine a no asnai'ag'ne. p. Pica: iimny seed stor- 
: '.ens A ii ■■. u: - .)pn<>nK. aninir^-ternvn us suggests thai 



ae. 



tae pnmar; H'uG5 gene pro-Juct i> traushued on the rousn 
endoplasmic reticulum (ER) and furtner suggests it ma\ 
be stored m Pipiem bodies. The mature pioi-m has a calcu- 
lated pi c.f ; 1.5 and thus a; neuira! pr] ,ii -aid be soliibh 
;n Aater. "The predicted molecular weigin -.nAhe HaG 5 gene 
oroduc- i> 3X kDa. nearly twice the experimentally deter- 
mi ted vame of !'■* kDa for me mature proiein (Allen et ai 
i M A : Cohen !^mp. Tltese reMilta. as' ^ve!: a-- -equence anai\ 
sis oi trie ami n. -terminu ■; of tne mature .dbumin. indicate 
substantia! piocessmg o:" the primar-. //,/3}f translation 
produc 1 . h is possible that the LDa prect:rsor is cleaved 
into l\\o polypeptides that ape approximate: the same size, 
and in lao resuits of Cohen - l^Sro sugges.et. a minor alba- ■ 
min pro tern species with an approximate molecular weight 
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Although the post-translaiionai pri »i.:essir,g of the major 
suuilower albumin is extensive, it was no- unexpected iri 
view o! the suhstantiai processing ih;p ..>L:urs for other r 
known, 2 S albumins. The diversiP o\ inev. events is note 
wortiyv. far example, castor bear, 2 S aibmvins are synthe- 
sized a. ; a 32l\P>a precursor: i his n<q- pep"de undergoes 
extensive processing b'oil 1 m the lumen oi }k* ER and ! r 
tiie main', of protein bodies to genera ic ko'^e and sin-- , 
poivpepiuies linked bv intern 'ojecu! ;p- Jnii'i "ide bcMidslL^P 

But;erworth and hmxi his -o. "T"!k- sp-iicture of 
mature cantor bean 2 S Moraee proicin - .imilar to tna- 
observed iov Br.issn-a napin a rouch et a!. i'->X3; Ericsoi; 
eta!. p'Sni and Bra/i! nut sulfur-rich pro[; :n (Sun^ a - 
1 V 'S~); however, the precursor poKpepiides for napin , 
the Hra.al nut sullur-nch jM'otem are substantially smal lt ' 
(20 and \> kf)a. respective!) I than the casio; bean 2 S P rJ ' 
cursor. The ^ and 4 LDa pea albumin poi- peptides are J L 
rived proteolv ticallv trom a 13kDa prepr uein (Higr' r 
et al i^s^i In contrast to the preMousK cued examp 1 ^ 
houever. the two hnv molecular weight pea albumins^; 
not disulfide linked and mav mM even be associated" 1 
planta. Allhoueh not included m die 2 S da of seed P fi 
terns another ma jor pea seed albumin is >\ nthesi^ eu 
a 2hkOa protein without a signal peptide and does | 
undergo significant post-translational mtiditlcation 

I 
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Rg. 5. Phyloneneuc conservation >>t /-/atm sequences. Scuuences 
taring homolog) vum //."Fa 7 maa m F-aa 4; v^aa aiemmec 

'iv a computer search .Cppaem sequence data Mom or were 'd.am- 
•■ed bv inspection of seq uences compiled h;. kreis at af iF^ma. 
-I. Above the line are predicted amino acid faai sequences mclud- 
:ny :he Bl and B2 regions ,F /7;/G5 and Hu: < >ec Fig. 4p mimedi- 
jtelv below the line a a consensus sequence for thp region. Se- 
quences, for Brassi.-u napm 2 i aa F)2-F^a Drench et a! F'Vm. aar- 
-v trypsin inhibitor at a 4o-~a Odani ei al. Fm-. uhc.tt </ -am via sc 
inhibitor (aa 3 4 -^5. Maeda c* a!. F>mM. castor :^ean storage protem 
■aa MO. Shanef and L: I L »0. maize trvp>m mhabma' ua amso. 
Mahonev et al 19>4,. rnflct a-psm p-aira lave inhibitor t aa -1 
_ampo> and Richardson ! u 'a"V,, Brazil run 2S sliII ur-ach ppaeaa 
arge subunit iaa l, -43. Ampe et al. 1980). wheat high molecular 
weight prolamms aia -in-". Forde el al. lOs'.M and -\e v-secaim 
:aa 86-: 2F Kreis et ai iONrhi arc shown below the sunllower uF 
"UTiin consensus scLiuence and are algtned to ma\imi/e homology 
"'eiween the varu>u.s -,cquences. ti<>^c indicate ho::v»i >g> with ihe 
untlouer albumin co-iiNensu-. sequence 



rins et al. ] 9X7 ». Pia>ccssing <>! the suntlowcr albumin ap- 
pears to be most like tha' obsened tor ca<toi bean in thai 
11 is processed fn-n' a rather large precursoi- po\\ peptide. 
n ut the resulting n:atnre piauem is larger and is ornpoed 
"'^ single pohpjrM Kite e onnntinie mie nr ni»M"e ; n ! ra-imdee- 
J iar disultlde linha-es i Mien et al. F'N"). 

A Lomputer ^eatch »w' protein sequence data ba>e-* iuen- 

1I)e d ^igniheant h> inn ho^ic^ hei^een lite predicted amino 
JC 'd sequences i;\ /hi? and napin (('roueh. et al. 

The scqiiCKe nnniF 1 f _>',K F\l : I is represented] 

in i v once at pu>iuon 1<>1 m the napin preeurx.M. but 

15 iound twice in both // L /( i5 and These sequence 

i 

tle nients are designated Bl and B2 in lag. 4. kreis et ai. 
'l ( ^5a. b) deFined a storage protein supertamih that in- 
'-juded napin as well as other heterogeneous seed proteins, 
rr.ost sieniheant homologies bet'^eei^ the predicted 



J - protein and these proteins occur m live peptide 
^ a 'n -B" as defined b\ Kreis et al. i i^a. bi ami mclueie 
LQQCCNFT sequence mot if Sequences ineiucimg the 
1 - and B2 region-, of //,/( ^ and ILr i Aere .ompared wndi 
^ n aracteristic sequences of this super! am :!;■. ; the romts .>i 
' ne;,e comparisons are summarized m fog. The tnosi 
,ln ^ng obser\am>n is the conservation of the I <X>t I M : L 
rri ° u ' in most sCLjuences and m particular the unarianee 



vM the cysteine resiO.ues at tlie aagneO positions ji;e e 
aiui iemu.c at p^stnoi. s | P addition ihc Metne re^Kiue'- 
„: po-.n :: '< an-.' F' a*'.- neat"' ::n. O'tan . Mr. inne': ,; :e' 

' K t t i \| i i ( n •■ 1 1 ' I t ni 1 1 \ ! ! Or_* »" 0 I ' 1 i 1 ! '. O'j! '!);"'' 
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t-, Krei. :: a. i: g N^:ii and iunnc!" li i ma :a L ed 
lot clea' . howe-e:. tae ^trtanm Pli . logenctn 



1 1 



ot L 



-the 



1 I : " h 



mon i re*, tea eo 1 1 : i v :"cr m a 



a; aieaest . e. common nr 'gemt -»r F ,: 'lie ^ ^ arMimtm. 
of duve , .iiiu iieterogeneous ;tieii< >coi seed nie.i.eins ; ueaio. - 
me pPmrmns and v ani"a- e:;:">!ti.: mhibipom. i 'a taenia!'!; 
;c;e am t 1 mis pom: are :1k r^een. 1 m^e:' 1 a r n ais ; o ■ enipic- 
nv|i: -t i; iF'Sof t!io; siioe ,'saic 1 ! tern alnumm storage 
proteins ,nare anaeentc otetertrnnam s atKi Muctemmc v;- 



im - !r. et eet 



ojuetice homolo*e^ us'h f>r<.>y\!i\! mtpm. Since let 
from the evolutional"; Ime giving rise to- mgiosperm - aaor 
to i he divergence oi m^anoC'O'o and dicom 'f roiiquist ' th r 
th-j.; ]■.• id'.s provide further .^idenee o; the e\o!'jM"mm;. 
vela inn^iai^ iviueen dicot aibumirts and. • . arioti:. trionocoi 
sCeel p;-. a -.'ins. 
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the Tea; oevancee [.aaooi.-e; ftesea'an ! J a>gram anc r-th-'iie- 
o, M ..j<i, \ ^■•:'eoe '* ! a> a'coicm v\ !-t 1 ■■■'.[■, l ' r e'.!>>-'- 
torai be : .-v\shm. We ' ran. i'-'r Fmi \ido,i!^i io-- at- caiiu. 
rev ha- o ■ h ; s aianu.scra^t and hr heh' m ''r^u'ii pmammon ,,a. 
ami!':* t i., ^eottencmL: 
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[he b-32 protein f!*om maize endosperm s 

jn albumin regulated by the 02 iocusr 

VucSeic acid (cDNA) and amino acid sequences 

V Di Fon/o 1 . H. Hartin»s , M. Brembilla . \\. Motto, C. Soave \ t Na\arro . a. Pauui . \\ . Riioui- 

lstituto Sperimermiie per la 0 erealicoli u:\i. se/ione di Bcreaniu Via. Sic/.zan- 24. !-1'4IOm bere'aiin ■. ua!\ 
L : niverMUi deli a Basihcata. Pnlenza. I tab 

Centred Imestigaeiu \ Descin olupament e'SP. . carrer ' \\ \ i Mia Saieado IS. h-hsnx: Huux-loaa. 
•Max-Planck Insiuui fin Zuchtungstorsehune. i> xmmio Koln. federal Republic of ' lernian' 



Mimmarv. Phe cDNA coding for the b-32 protein, an aibu- 
mn expressed in maize endosperm cells under the control 
f the ()2 and Oh ioci. has r^een eioned and the complete 
imino acid sequence of the protein derived. A iumbda gill 
;DNA library from mRNA of immature- maize endosperm 
*as screened for the expression of the h-32 protein usme 
aitibodies against the punned protein. One of the positive 
:iones obtained was used to isolate a fuil-ienmh cDNA 
;ione. By Northern analysis, the si/e of the b-32 mRNA 
vas estimated to be 1.2 kb. Hybrid-selected translation as- 
ays show that the message codes for a protein with an 
apparent molecular weight of 3<)- 3: kD:l The nucleotide 
sequence show? that several interna! repeats are present. 
The protein has a length of 303 amino- acid residues 
mol.wt. 32430 dalton) and its sequence shows the follow- 
.ng features: no signal peptide is obser-abie; it contains 
ieven tryptophan residues, an amino acid absent in maize 
-torage proteins: polar and hydrophobic residues are spread 
iiong the sequence: several pairs of basic residues are pres- 
ent in the N-terminai region, the secondare structure allows 
ne prediction of two structural domains for the b-32 pro- 
■£in that would fold up giving rise to a globular shape. 
The cloning of this gene may help in understanding the 
■ole of the 02 and Oh loci in regulating the deposition 
"I zein. the major storage protein of mai/.e endosperm. 

Key words: Zem regulation ()2 Oh b-32 protein, 
■'DNA ctonme 
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introduction 

; he protein h-?2 of maize endosperm n a m 

with an apparent molecular weight of aboul 32 kDa. 
listing in different genotypes m two isoelectric lormo one 
v Uh a pi of 5.S and the second with a p! of h.O The 
■*o variant^ show similar amine 1 acid composition, but mi- 
^>r ditferences are Tiown b> i heir tr. ptic peptide maps. 
: he protein is local: zed in the soluble par' of ;he cv to pi asm 
; nd docs not bind to am particulate structure < Di ron/o 
-*ai. I^Ko). Its expression during development is temporary 
y and quantitative!) coordinated with the deposition of 
borage proteins m endosperm tissue iSouve el al. 1°SI i. 

In all maize mbreds so tar studied the b-32 protein is 
°und. either in the acidic or in the basic form, as a eene 
3 roduct of two codommant alleles: it has also been shown 
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thai the u2 and oO mutarits iaei tin- protem • .->» •:• -. e ex a 
l^x; ; . a- both mutant-" induce a concomitant ^ecreas- ■ 
liic production oi zem poivpeptme: and of m'^aem h-3 
n !:■ possible thai tnn protein caii act a: a /;■;,*/,' --j 
regulating storage ^roten: l^n.^^^v; \. •• > 
unrelated control both ^em an 2 t-32 ^r^env. ^\ ivy; 
gene produchs, caiino; oe ex^iudte.. "3 'iiaiev e;- :nc di/ierer.. 
control mechanisms mieh; ; ?e. information o* '.ivemt 'ectda:' 
structure of the b-32 protein ma"- <h&S -one huh c-r. 
biological role within the endosperm cells. 

in this paper we report the isoadior: and analvsi.-, ■;».' 
eDNA clones prepared tVom. mflNA of maize en do srenr- 
cells and coding for a product c o r respon dim: to the ::-32 
protein. This has been novsip,e because o; tne a 1 aiiabilnv 
of purified anti-b-32 sera • D: -—:./ eta,. ; ■'<<'■ : or 2a-: 
screening of a lambda gt! \ expression librar . The compiete 
nucleotide sequence of the b-32 message, a: well as the 
ammo acid seuuence of the ' n -32 proiem is 



- ,r )ed. 



Materials and methods 

Pliim material The wild-type version of the mbred v\ n4 a 
iZca /nays Lo was used for !aige--ca!e preparation and 
purification oi' the basic form of the b-32 protein, as well 
as lor preparing total and poiv.O K'W "lie -'2 and 
oh mutants, in the back ■around of the line o 4 
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when needed used to prepare Pv 
In o'iK e\periments. wnd-;; a. id t 1 mam . x a z : . . 
mai/e lines B3~ and Ah°\ were also mi'imco s.s oiled 
m Tie text Ears were collected it 2^ ; ^- da. "^^ a::e" poTna- 
tiou. !ro/en in liquid mtrogen and siored ai --Mp o until 
use. 



Enzrmcs and chrmuai>.. DNA resincpon endonucie^ses. 
DN ; \ polvmerase 1 Klenow fragment, reverse rranscrip'asc 
and RNAse A were purchased Irom BetheM.ia Reseti rch 
Laboratories. 7-[-P]dCTP. jA 1 ? 's jd ATP. L-f } "Sjmetinoii- 
ine and | ' 4 C'j-methv I a ted protein mixture were pur^iiaseLl 
from \mersham 1 nterntitional. 

Po/r A ' R\ \ Total R N A was extracted from dissected 
endosperms and pur; tied as described b\ Dean et al. 1 lvs3i. 
PoiviAT" RNA vv;tv prepared h\ two eveies of oligen dT t 
cellulose chromatograptiv (Aviv and Leder P^~2). 

Expression library in lambda ex*// .An expression Iibrarv 
was prepared from endosperm poly(.A)' RNA. using the 
cDNA synthesis svstem from Amersham International. The 
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svmhesi/ed :!)NA wu^ size >eiccied ■ rm, n; aearose 

«e! electrophoresis, ami remaining linker remove: 

b\ adsorption or. DL.-^L Filter:- ; Whatman Uh^I- a> cic- 
scrtbeci bv Dret/cn et al i F'M :. The /T < <R!-iink ed a'''- 1 - 
\u> uealed to ciepiiospmu v lai-.d a .a.... a.. 
i!t!I arms (Promina Biotech), and packaged m ■ nro. \p- 
proximaieF 2. 1(F plaque forming anils were ohiamed. 
from which SH",. were re.\ mi bmants The library was ampli- 
fied on Escherichia co/i YUMi iRromcga Biotechi. 

Antihodv screening of the / -a/// library. Serum lor screening 
was raised in rabbits and purified as described h\ Soave 
et al. i!9NI). The libran was plaied and after incubation 
at 42° C for 4 h the plates were o\er!a\ed with dry nitrocet- 
luiose niters saturated with Ml m\l isopropyl jbo-\ hiogalac- 
topvranoside, and further incubated at 3" C lor - ; ii 
iYoune; and Davis. F>S3). After this second incubation, 
filters were washed with saturation buffer (PBS: 7% bovine 
serum albumin; 0.05°,. Nonidei NP4(U. PBS was 10 m\l 
phosphate buffer. P H Iff'mM NaCL The serum ^-^ 
dilutee! with the saturation buffer t 1 100 1 and used for incu- 
bating the filters at 7~ c C overnight. After recovering me 
serum, filters were washed with a solution oi IV rn.w niios- 
phate buffer. pH 7.5: 1 M NaCl: 0.05% Nonidei NP40 for 
1 h at room temperature and incubated for 2 h at room 
temperature with i: 'l-protem A t > 30 mCi mg. Amersham 
International) in saturation buffer (at 5 a FF cpm mil. I'.vs- 
itive clones were purified by successive cycles oi antib"d> 
screening, until all phages in a plate showed a positive sig- 
nal. 

In ratio translation and unniwioprecipitation. Immunoprc- 
cipuation of in vitro translation products was performed 
as described by Davis et al. (F>N6i. Proteins were analy-ed 
bv SDS-12°n polyacrylarmde gel electrophoresis. 

Sorthe.rn Not unalysis. One microgram of poivf.AF RNA 
was resolved by electrophoresis on a formaldehyde-agaric 
eel (1.3"n agarose: 2.2 VI formaldehyde: F\> 3-[N-morpho- 
hnojpropanesulfonic acid). The gel was soaked in 20 ■' SS<.. 
for 30 mm and the RNA transferred to nitrocellulose t liters. 
l.<SSC'=15m\l sodium citrate. pH 7.(1: 150 mM Nat/1. 
The filter was hyhndi/ed according to Maniatis et ai. 
( ]0s2i. 

Hybrid-selected translation. Denatured DNA ilug* was 
spotted on mtroceliuiose illier-. with the aid of a Mmilold 
(Schleicher and Sciuill). Filters were washed with 4 ■ sSl 
and baked at S(f C under vacuum. The filters were preb;.- 
bndi/ed m forniamide: I'Mn.vi pipera/mc-N. v -^>7- 

ethanesulfomc acid]: 0.4 M NaCi. pH 0.4; 7()d ug ml po- 
lv(A)" RNA for 1 h at 52' : C. PolyiAF RN As i 3(> ug , were 
hvbndized for 3 h in 120 ul of the above buffer except for 
polyiAF RNAat52 r C. Filters were t hen washed live times 
with lOmM Tns-HCl. 2 m\l FDTA; 0.5'\> SDS. P H no. 
and five times with 10 m\l Tns-HCl . 2 mM FDTA. pH S.o. 
Bound RNA was eluted at 5v. \^ and ^ « ' m 20(» ai 
of FFO in 2 mM FDTA and quenched on ice. Carrier 
tRNA from calf liver ( 10 ug ml) and ; M sodium acetate 
(pH 5.6: 20 ul) were added. The samples were both precipi- 
tated and washed with "()% ethanol and the pellets used 
to direct protein svnthesis in the rabbit reticulocyte iysate. 

Restriction endonuelcase mapping. Restriction endonuclease 
cleavage sites were determined by single or double digests 
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1 2 3 4 5 

Fit-, i. The eieeirorhor?tie pattern of in vitro nthesis directed 
nCpehf , ; - R\..i. i! yci. eviractcc from ".lie inbred line A69V 
wViii-tvpe is shown ;r. laiie 1- The ;v^i!oi. ^: migration of b-32 
, :j i L ; ■ • / » i ■ ■ corresponds :•;.;> daii^; "Uited purified b-32 
(hme "i. Lanes 3. - and 5 correspond to immunoprecipitated prod- 
uei-. from in vitro Translated poiyiAi" RNA from wild-type. o2 
and respectively. Lanes 2 and - were loaded with a standam 
sei of ' : "CUla belled proteins 



with various restriction endonuciease.s. Digestion products 
were resolved m conveiuionai horizontal agarose gels. 

/) V.-l sequencing. The dideoxynucleotide ci;ain termination 
method of Sanger et al. ; l u "o.ui, f..ll-wed i^ing the bacte- 
riophage vectors M!3mplS: and \i'i3mp:^. 

Computer anaiy^r. The hv drophiiicir- ph- , T the deduced 
ammo acu: sequence w:is ohtained acc- -Rimg to Kyte anu 
Dooliiile i\^2c The prediction c, tne b-32 secondai) 
structure was made according io the pr.^ceuures of Garnier 
,_»• a y ,i«rsi -nd Chou and !" ; a.mat^ ■ ,u a 1 m the computer 
\ersion of Parnlla et ai. * lOX^i. 



Results 

Control oi o-a3 messes by the 02 and (P> loci 

Prevunis results iSoave el ai. l'*Nl. D\ Fon/o et al. 1 g ^' 
have shown the absence of the b->2 polypepude in P r ^ 1 ^ 
extracts of the mai/e endosperm mutants ( r and oh. <- " 
we have studied to what extent b-?2 mRNA is P^ enl J^- 
the two mutants (Fig. 1) Lane 1 dis P ia;s the pattern^ 
ttie in vitro protein s\nthesis primed, bv total P ol >^' ' 
RNA extracted from the wild-type endosperms in the bac 
ground of the inbred line Af^Y. Lanes 2 and were load^^ 
with molecular weight markers, while lane " shows the po> 
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fig. 2 A, B. Northern blots of pobiAL R N A ica I an i for th 
or the oJ" and «>o alleles. The cDNA insert I'rom clone /t 
jj densitometry scanning of ilie 3 da>s exposed film 
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A BCD 

Fig. 3. Biol analysis of the abundance of the b-32 mRNA. Lane A : 
;^ig total mRNA. Lanes B. C and D: amounts of cDNA from 
:b32.14 clone as specified. The probe was the insert itself from 
.0-32. 14 clone 
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'% 4. Hybrid selected translation experiment. The cDVA from 
"b-32. 14 clone was used as a probe for hybridization. Lane L im- 
^noprecipitale of an in vitro translation of I ug poKiAL RNA 
: * traded from B37 wild- type. Arrowy indicate position of poo pep- 
'de b-32. Lane 2. the same but not immunoprecipitated. Lane 3. 
tandard set of molecular weight markers Lane 4. endogenous 
r anslation products of the rabbit reticulocyte i \ sate. Lanes 5-T 
'Vbrid selected mRNAs translated after posi-h> bndization wash- 
es at increasing temperatures {55°. "5 C and l *5~ C. r especti \ el\ i 
■°sition occupied b\ zcins iZ). glutehn-2( R SP > and minor compo- 
rts (*) are indicated 



100 bp 



\. 5. Restriction enaonucieasc map of the cDNA insert of /.b- 



J_. oo anc sequencing strategy. The directions o: vecuiencme o! 



eacn restriction site are indicaieu ov ar;v> ( o 
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tion of dansyktted purified b-32 - arrow). A major in vitro 
product corresponds to tins position in Line ;. Lanes 3. 4 
and 5 shov\ for the wild-tvoe of and no mRNA extracts 
the m vitro transiation products precipitated by an anti-b- 
32 antiserum. The in x'itro synthesis of Lite P-32 product 
is detectable only for the wild-type, confirming previous 
conclusions on the role of 02 and O0 loci in the control 
of protein b-32 in the celk 'A the endosperm. Ii can also 
be obsened in lanes 3 and 5. corresponding to t.ve wnd-vpe 
and on extracts, a precipitation of '"ciaf el-- t: oaf ! cjuantnie--' 
of /em -type proteins. Tins finding is probaiiU due to the 
exceptional!} large amount :h;- /em itjc- :aac ;n maize 
endosperms. 

c/T\ Uo!W![ r dlli! HV.}}Ui!ln L i(. /o- 'hfit of '2 /o,'k" 

A lambda gtl 1 expression library was prepared from endo- 
sperm mRNA isolated 20 days after pollination. An anti-H- 
32 antiserum was used to isolate cD\A in sens expressing 
the b-32 polypeptide. Approximately 1.3 Iff recombinant 
phages were analyzed b\ filler h\ hridizauori and various 
clones showing positive signals were isoiated. Six of these 
clones were further analyzed m detail and purified by replat- 
mg and screening with the b-32 antiserum. Only clones des- 
ignated /.b-32. 14 and /.b-32. 1 01 were confirmed positive. 
Their cDNA inserts hav e a size offf~ and 0.3 kb. respective- 
ly, as shown by £VoRI digestion and subsequent gel electro- 
phoresis. The cDNA insert from clone /.b-32. 14 was ampli- 
fied and used as a probe for Northern blot and hybrid- 
selected translation experiments. 
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121 

GCG 

Ala 

181 
GTG 
Val 

241 
ACC 
Thr 

301 
CCG 
Pro 

361 

Pro 

421 
CCG 
Pro 

481 
GAT 
Asp 

541 
CTG 
Leu 

601 
CTG 
Leu 

661 
GGG 



GCT GAG ACA AAT 
Ala Giu Tnr A s. n 

AAG TTC ACT GAA 
Lys Phe Thr G 1 u 

TCG GTC CGG AAA 
Ser Val Arg Lys 

CTG CCA CCG GAG 
Leu Pro Pro Giu 

AGC TCC ATC ACG 
5er Ser 1 1 e Thr 

GGC GGG GTG TGG 
u ; y u 1 y va i i rp 

GGT GGC TCG GCT 
Gly Gly Ser Ala 

TCA CCA TGG GCC 
Ser Pro Trp Ala 

GGC GAC ACT GGA 
Gly Asp Thr Gly 



CCA GAG TTC AGT GA; 
Pro Giu Leu Ser A sc. 



ATC ~TC c C c GTG j AG 



C ' < i J 'J^U ACr*. A.AL 

^eu Met: A i a y i r Thr Asn 
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1 ■■■> : .. i T 
*^ .~\ f- ri M 



AGC 



lie Phe Pro Val Giu Asp Val Asn Tyr Pro Tvr 
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c 



. ^. : JMC ^: 



A A ^ r 

MnM 'JUU 



Asp Val lie Lys Hi's Cys Thr Asp His Lys Glv 
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Gly 
721 

Trp 
781 

GAC 
Asp 

841 
AAG 
Lys 

901 
GAC 
Asp 

961 



GCG GCG GCG GCG 
Ala Ala Ala Ala 

GTG GTC ATG GTG 
Val Val Met Val 

TTC AAC AGC CAG 
Phe Asn Ser Gin 



GAC AGG ATC TCG 
Asp Arg 1 1 e Ser 

ATG CAG AAG CTT 
Met Gin Lys Leu 

AAT CAA ACT ACT 
Asn Gin Thr Thr 

GAG GCC TGA tea 
Giu Ala 



AAG AAG GTA CCC GAG 
Lys Lys Val Pro Giu 

CTC GCC ATA CGC ATG 
Leu Ala He Arg Met 

TGG GAG TTG GCA AGG 
Trp Giu Leu Ala Arg 

TCG GCG GCA GGT ACC 
Ser Ala Ala Gly Thr 

GCG eGA GAT GAC CAG 
Ala Arg Asp Asp Gin 

GGA GGA GGA GGT GCA 
Gly Gly Gly Gly Ala 

GCG GCG GCT GAC CCA 
Ala Ala Ala Asp Pro 

TGC GAG GGG CTG CGG 
Cys Gl u Gly Leu Arg 

CAC GGG GTG ACC TTG 
Hi s Gly Val Thr Leu 

AAG GCG GCC TTC GAG 
Lys Ala Ala Phe Giu 



CTG TGG mC TAC ACG GAG 



*-\ b u 



.en 



p Phe 



Thr G i u 



C t C "\ ■*', f ' ~ 

Leu Lys Thr Arg 



GAC AAC CTG TAC C T C GTG 
Asp Asn Leu Tyr Leu Val 

CCG GCG ACA CCC ACC TCC 
Pro Ala Thr Pro Thr Ser 

AGG ACC TCA TCG GCA ACA 
Arg Tnr Ser Ser A 1 a Tnr 



bGt Lili ; CAA L"bA ^ „ . 
Gly Arg Gin Arg P~o Gly 

GAT GCA GA T GCA GAT GCC 
Asp Ala Asc Ala Aso A" 1 a 

CAG GCC GAC ACG AAG AGC 
Gin Ala Asp Lys Ser 

TTC AAC ACC TTG TCC CGC 
Phe Asn Thr Leu Ser Arg 

ACC GTG ACG CAG GGG AAG 
Thr Val Tnr Gin Gly Lys 

TGG GCC GAC CAC CCC ACC 
Trp Ala Asp His Pro Thr 



GGC T TC AGG ACC 

Gly P.ne Arc; Thr 

I'A 1 'jL j At, u , n C ^ 

Se^ a; a Thr Thr 

AGG GTC TGG AGA 

Arg Val Trr; Arc 

GAA GAA GAA GAA 

13 I U b I U j i U 'j i (J 
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AAG CTG GTG AAG 



. e u / a 



ACG GTG GAC GCG 
Thr Val Asp Ala 



GGC ATC AAG AAT AAG 
Gly He Lys Asn Lys 

GCC GCT GCC GCT GCC 
Ala Ala Ala Ala Ala 



AAC GAA GCA GCG AGG ATC 
Asn Giu Ala Ala Arg He 

GCT ACT GCT GCC AGT GCT 
Ala Tnr Ala Ala Ser Ala 



CAG GTG CAG AAG 
Gin Val Gin Lys 

GCT GTG ATC CCC 
Ala Val He Pro 

GTT GCG CTC GTT 
Val Ala Leu Val 

GAC AAC GAC GAT 
Asp Asn Asp Asp 



teg aca cat cat gat ctg ctg c t.g cac tta ata tgt leg tat aca 



aat d ^ -ac cac cc: acg :gg igi tec tta tat aao cai aaa aa 3 



H^. 6. Nucleotide sequence and deduced amino acid 
stop codons are douhii utuurlinni. Poiy<A» has about 



M-'queno.' oi die /.b-M.hh cDNA insert. Polyadenylation Miniai is underlined, the 
\ residue^ 



Identification <>f the h-32 me:;sa%c 

Total poiyiAH RNA was extracted irom endosperms uf 
the vviid-type. and <>6 \ crsions of the in bred line B3". 
As Fie. 2 shows, only in the case of the wild-type was a 
mRNA positively and clearly detected after 2h exposure 
ot the gel; after 3 days exposure a small amount of po- 
!>'( A i RNA, prepared from the alleles o2 and of), hybrid- 
ized with the cD\A probe. Jt is concluded that the clone 
/.b-32.14 isolated from the expression library, contained a 



nucleotide sequence demed from a mRNA species under 
the transcriptional control of the ()2 and 06 loci, as i~ s 
the case lor the b-32 protein. This is strong circumstantial 
proof that the cDNA of the isolated clone corresponds to 
a reverse transcription cop\ ot' a b-32 mRNA. 

Cihiriieteristies of the h-32 mR.\ A 

The size of the b-32 mRNA as shown by Northern blotting 
of total polvfA)" RNA is approximately 1.2 kb (Fig- 
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Sequence number 

r his value corresponds to a protein consisting of about 
: 0O amino acid residues. The abundance of the b-32 mes- 
lee in polviAO RNA extracts of wild- type endosperms 
/as determined by blot analysis, using for comparisons in- 
Teasing amounts of cDNA insert from /.b-32. 14 ( Fig. 3). 
"he results show that there are 2-3 copies of b-32 mRNA 
)r every JO 3 copies of total mRNAA. 

Hybrid selected translation experiments using the 
.DNA insert from 14 as a probe are presented in 

: ig. 4. A number of appropriate controls were performed: 
me 1 indicates the position in a gel of the protein precipi- 
ned with the b-32 antibody from in vitro translation of 
*oly(Aj* RNA from wild-type B37 endosperm: in the in 
ltro pattern of total poly(At~ RNA (lane 2) the major 
'and of 32 kDa is present; lane 4 shows the polypeptide 
ands corresponding to endogenous translation products 
■f the rabbit reticulocyte lysate. The hybrid-selected transla- 
on samples occupy lanes 5- A Posi-hybndi/ation washings 
i filters with bound mRNA- were carried out a: 55°. n y 
nd 95 r G. The hvbnd-seiected products were then eieetro- 
noreiica 1 1 \ run as snown in the figure. A l tne lowest lem- 
erature. unspecificalb hybridized mRNA was eluted and 
ie bound messages mainh ga\e rise on translation to /eins. 
utelin-2 <RSP protein), a protein diffusing m the gel ar- 
und a position corresponding to a molecular weight oi 
'-'kDa and the endogenous polypeptides of die i>saie. On 
aising the washing temperature, the pattern of translated 
totems gradually changes. The polypeptide band of 
^kDa. immunoprecipttable with anu-b-32 antibod>. 
•ronglv increases in intensity, whereas the other translation 
^oducts observed with washing at 55" G gradually disap- 
*ar. In addition to the h-32 mRNA. two minor compo- 
rts of lower molecular weight than b-32 polypeptide <as- 
; Hsk in the figure) seem to be still bound at °5° 0. Their 
• v el. however, is bv far lower than that of b-32 mRNA. 



*quetu v of a full icn^ih 6G2 niRXA rP \~ A ; lone 

V cDNA insert from the phage /.b-32. 14 was used as 
■probe to rescreen the library in order to identity a full- 



300 



length cDNA clone The re screening joeldec! 46 positive 
clones of different insert lengths. The largest clone ■; Ao- 
32.66'} showed a cDN \ insert of aoorn i kb which was 
considered to correspond to a full-length b-32 cDN.- clone. 

The restriction mar of the cDNA insert form zo-32 66 
is shown in Fig. 5. its sequence w as determined by the strat- 
egy aiso depicted in Fig. 5. Figure 6 shows the r-udeoude 
seciuence as well as the amino acid secuenee <; ! 'die pro^ieir- 
encoded by tiie largest open reading tram-. 'Ac insert oi 
clone zb-32.56 corresponds to the expected full-length b-32 
cDNA clone. There is an open reaaing frame of 9i'9 nucleo- 
tides, corresponding to a protein of 303 ammo acid resi- 
dues. The translational start cod on is preceded oy a TO A 
stop codon that would invalidate the * ransi itron of any 
lamer polypeptide. The 3' flanking region contains a typical 
poivadenylation signal located 4" nucleotides downstream 
from the stop codon. The sequence of the cDNA clone 
/_ b-32. 14 was aiso obtained. The length of b-32. • 4 was 
662 bp and. withm the coding region, it. sequence we^ dd- 
lerent from the full-iength cDNA at position do- 'subsmu- 
tion ol A b\ O i. 



Structural anulvs:^ >>\ the /» '/vpcrnur 

The molecular weight of the 3"3 residues pop. pepUue as 
deduced from the sequence < >; the /.b-3-.nr ci-. ^ne 
32430 dalton, which is in good agreement with values deter- 
mined by SDS-gei electrophoresis for the b-32 protein. In 
addition, no sequence with the characteristics ol a signal 
peptide is observable after the start codon. 

In the second half oi" the translatable region a number 
of nucleotide duplications are present, including trom 4(^ 
io 4S0 a GAA G A A OA A GAA sequence o.c. Glu-Glu- 
Giu-Gluh from 4^3 to 5(P a GG A GGA GG A GGA GGT 
(i.e. Gl\-Gi>-Gly-Gl>-Giy): i'rom 5ns to 525 a GCA GAT 
GO A GAT GCA GAT o e. Ala- Asp-Aia-A sp- Ma-Asp ) . 
from 544 to 564 a GOG GOG GOG GOG GOG GOG 
GGT i i.e. Ala- Ala- Ala-Ala-Ala- Ala-Alan from 5S l » to 606 
a A AG OTG GTG A AG OTG GTG lie. Lvs-Leu-Va!-Lys- 
Leu-Vali; from 856 to S~3 a GOO GTG GOO GOT GCC 
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H<». 8. cDNA-based amino acid composition oA-v 
pared to I he one chemically determined ovv/ cira'c^ as reported 
in the paper b> Di Fonzo et ul. ! HSb) 



GCT (i.e. Ala-Ala-Ala-Ala-Ala-Ala); and from 891 to 903 
a GAC AAC GAG GAT GAC (i.e. Asp-Asn-Asp-Asp- 
Asp). An inverted repeat of 9 nucleotides (from "32 to oi ) 
is also observed in the same region. 

With respect to the amino acid sequence of this protein, 
there are different features that deserve attention. Polar 
and hydrophobic residues are spread along the whole chain. 
The molecule can be divided approximately in two. The 
extreme N-terminui region (residues 1-70) shows an enrich- 
ment in proportion of pairs of basic residues. The C-termi- 
nal domain is rich in repeats, either of the same residue 
or of groups of two or three residues. To obtain more infor- 
mation concerning the two postulated domains of the mole- 
cule, some predictions were made of its secondary structure 
(Fiu. 7). The upper part of the figure shows the hydrophili- 
city plot of the polypeptide chain. It can be observed that, 
within the N- and C-terminal domains of the b-32 protein, 
hydrophobic and hvcirophilic segments alternate. A small 
/one divides the region' around residue IbO; 'he zone 
corresponds to a highly hydrophiiic sequence very rich in 
acidic residues (b out of " are Glu or Asp) that should 
be flexible and located at the surface of the b-32 protein 
molecule. To make predictions of the b-32 secondary struc- 
ture I wo procedures were followed. The structure obtained 
with both procedures coincide for most of the segments 
with compact secondary structures. The lower part of Fig. 7 
shows the predicted alpha, beta and turn structures along 
the b-32 polypeptide chain. One can also observe the ex- 
istence of a central region, probably poorly structured, sep- 
arating the N- and C-terminal domains that are rich in 
secondary structure motifs. These two regions have all the 
requirements to fold up. giving rise to Two well deimed 
structural domains of the molecule. 



Discussion 

The results presented in this paper strongly indicate that 
the cloning strategy adopted was successful in isolating 



specific mR\A 



.in experiment was very low 



cD\ A sequence^ s\ .p.iainiiig ^i. ->pei: reading frame codinc 
I or pr-nciii h-31. n particular" i: na: been shown that; (1} 
lIic CK-iie, iM-ia.^. -creA mi.N A ^-.'d.ng lor a protein 
ot the expected -i/x 2 tin.- ppoie::t t correctly recognized 
n\ an ant >-b-3 2 antiserum aiK ; ■ c^ the o-. 
ie\ei observed m a \or:ne!a: 
in i lie <>i and on muiams a- e peciec 'v^eo on the absence 
ol' b-32 pro [em m tiiese eenoAPC' 

The ammo acid composition dernec. worn the sequence 
shows a cooo >imiia ri; ■■. . aiinougi; no. ..; pet teet coincidence, 
with that determined for the purified b-32 protein (Di 
Fonzo et al. T' >o. Fig. T rtr ail'l'erences noted for few 
amino acids are easilv explained l \v inns- a r li facts inherent 
m the chemical deieniiinanon of amino acid content, such 
as lev ei of pump, of the protein and differential losses of 
ammo acids during acid !^ua • . si>. in the protein b-32. 
2. OA. tryptophan •••• a iound. a ' aiue vshich is in contrast 



to the lack of this am? no 



a storage proteins 



( xlosse ei al. l' : 'oo). 



i . > i ■ . p- ■ ■ * T . 1 *"P ■"'^ • ; ' T\ 



the structural 
educed sequence, the b-32 protein 



F allow me ihe 
anulysis of the ^-32 ui 

■ be a a pica: eioomar proieiiA. As level in devel- 

e. a.. ■ Ax ' ■ is in the ranee 



appear 



opine maize eneAAPern 1 ;e*ouve e. a., 
of an average value o-- messages cooing for endosperm 
albumins and gfobuiins. F-espite the relative abundance of 
the protein, we peheve n may plu\ a direct regulatory role 
on zem synthesis. Based on genetic evidence, the b-32 pro- 
tein was credited with such a positive regulatory role in 
zem deposition iSoave et al. 198;: Di Fonzo et ah 1986). 
Further studies may substantiate :his assumption and. par- 
ticularly, couid : eveai if. a; postulated, die b-32 polypeptide 
is actually the gene product of the On iocus. 
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Structure of a Gene Encoding the 1.7 S Storage Protein, Napin, from 
Brassica napus* 
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A rapeseed chromosomal region containing a gene 
[nap A), which encodes the 1.7 S seed storage protein 
(napin), was isolated in several overlapping recombi- 
nant clones from a phage A genomic Library. Following 
restriction enzyme mapping of the genomic region, a 
subclone containing the napA coding region as well as 
some 1.1 and 1.4 kilobases of DNA from the 5' and 3' 
regions, respectively, was mapped and sequenced. The 
gene turned out to lack introns. Southern blotting anal- 
yses utilizing a napin cDNA clone as a probe revealed 
the presence of on the order of 10 napin genes in the 
rapeseed genome. The major polyadenylated transcript 
encoded by these genes was shown to be an 850-nu- 
cleotide species, the initiation site of which was 
mapped onto the nap A gene. The major initiation site 
for transcription is located some 33 nucleotides down- 
stream from a sequence perfectly conforming to the 
consensus sequence of a TATA box. Further analyses 
of the sequence revealed several features that may be 
of relevance for the expression of the napin genes. 



Napin, or the 1.7 S protein, is one of the major seed storage 
proteins in Brassica napus. It is expressed in a tissue-specific 
manner, apparently under the influence of abscis3ic acid 
(Crouch and Sussex, 1981; Crouch et aL, 1983). The mature 
protein, which is rather basic, consists of two subunit poly- 
peptides that are linked by disulfide bridges (Ericson et aL, 
1986; Lonnerdal and Janson, 1972). Comparison of amino 
acid sequences of the subunits with the sequence of a cDNA 
clone has shown that the initial translation product, a 20-kDa 
precursor, contains both the subunit polypeptides as well as 
polypeptide stretches that are removed during the maturation 
of the protein (Ericson et aL, 1986). By analogy with other 
storage proteins, the final product is thought to reside in 
specialized organelles, protein bodies, within the seed cells 
(Larkins and Hurkman, 1978). As far as is known, the sole 
function of napin is to serve as a nutrient source during 
germination and initial development of the seedling. Confir- 
matory evidence that napin, like other storage proteins, pos- 
sesses minor heterogeneities in the amino acid sequence stems 
from protein separation data (Lonnerdal and Janson, 1972) 
as well as protein sequencing (Ericson et aL, 1986) and the 
analysis of cDNA clones (Crouch et aL, 1983; Ericson et aL, 



" This work was supported by The Swedish Research Council for 
Natural Sciences, The Swedish Research Council for Forestry and 
Agriculture, and the StifteLsen Brinkgarden, The costs of publication 
of this article were defrayed in part by the payment of page charges. 
This article must therefore be hereby marked "advertisement'* in 
accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 

The nucleotide sequence(s) reported in this paper has been submitted 
to the GenBank TU / EMBL Data Bank uith accession number(s) 
J0279M. 



1986). As an initial step toward an increased understanding 
of the regulation of napin genes, we have isolated and se- 
quenced a member of what turns out to be a small gene family. 

MATERIALS AND METHODS AND RESULTS 1 

DISCUSSION 

We have isolated and sequenced a gene encoding napin. 
The gene is a member of a small family with some 10 genes. 
Transcription of an as yet unknown number of these genes 
yields an 8o0-nucleotide-long mRNA, the cap site of which 
was mapped onto the napA sequence. We have compared our 
sequence with that of another napin gene, pGNA, as well as 
with previously sequenced cDNA clones (Crouch et aL, 1983; 
Ericson et a/., 1986). The napA sequence is completely iden- 
tical to the pNAPl cDNA clone that we have previously 
sequenced (Ericson et aL, 1986). This makes us rather confi- 
dent that we have sequenced an expressed copy of the napin 
gene family, although we have no formal proof that this is the 
case. 

Comparison with the pGNA gene sequence revealed that, 
apart from single nucleotide changes, a quite frequently oc- 
curring divergence in the coding region is insertions of one or 
two triplets in pGNA relative to napA. These occur in four 
and two instances, respectively (data not shown). Apart from 
one previously reported triplet deletion in the pNl cDNA 
clone (Crouch et aL, 1983). These are the first examples of 
differences that affect the length of the primary sequence of 
the translated napin product. The number of nucleotide 
changes in the coding region is also higher when comparing 
napA with pGNA than with any of the previously sequenced 
cDNA clones (data not shown). It is interesting to speculate 
whether these observations may be related to the fact that Z?. 
napus is an amphidiploid of Brassica campestris and Brassica 
oleracae. It might be expected that the genes derived from one 
of the respective parental species would be more homologous 
to each other than when comparing across the parental border. 
We are presently attempting to assign parentalship of isolated 
napin genes by comparison with Southern blots of genomic 
DNA from the three species. Preliminary data 2 indicate that 
the napA gene most likely is derived from B. oleracae. 



1 Portions of this paper (including "Materials and Methods," "Re- 
sults," and Figs. 3 and 4) are presented in miniprint at the end of this 
paper. The abbreviations used are: SDS, sodium dodecyl sulfate; kb, 
kilobase; dNTP, deoxynucleotide triphosphate; AMV, avian myelo- 
blastosis virus; hn, heterogenous nuclear. Miniprint is easily read 
with the aid of a standard magnifying glass. Full size photocopies are 
available from the Journal of Biological Chemistry, 9660 Rockville 
Pike, Bethesda, MD 20814. Request Document No. 86 M-4366, cite 
the authors, and include a check or money order for $3.20 per set of 
photocopies. Full size photocopies are aUo included in the microfilm 
edition of the Journal that is available from Waverly Press. 

' M. L. Ericson, unpublished data. 
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Fig. L Genomic restriction fragments hybrid i/.iu»{ with ua* 
pin cDNA *t>tiueru.'e$. Genomic DNA was cut with resfrictior! 
enzyme*. 'Hit* £vrsrrr;iu>d tragntenrs ;ven» sep;-ira : -ed and b lotted onto 
n tiocc-lJuiose filters 35> doscribcd (^uior ''Material** and Methods." 
Nick-iransUted pNAPl cl)NA was used <is a probe in hybridisation 
r o :.b**e ti-rw:*. The ^nv.ymes i^ed were /t #cmHI; £. £aRI; //, 
//t/idlU. nnd P, !\>u}\. Tb* .size marker (M) uW. was an end- labeled 
rtijsc-st of phage A D\A Size* of the mtwxar banida were {from 
:;,;> to teio.-ii ;: 7242, rt*f>* 5o37, 4&2*A 4324.. Sb'V^ 23?.;*. t<#9, 

J.'.:64. and 70S base oatrs. 



M R 



Kk;. 1 Northern blotting and hybridization of rapeseed 
rnRNA to pNAFl cDNA. mRNA wa* puj-ifwd and *eparrttyd on 
denaturing agarose gels a.*- doscrioed unfit" r "Materials and MfJvxis. " 
A her rrfiMKjtrr to nitrocellulose filters the immobilized mRNA 
hybridized to a mck- trail* Lu*d rUNA probe. denotes the RNA 
lane; ,W. the rr^rkor In;;*- Tb< marker ua^d w/jft -a denatured HxnVl 
di„^st of pBR>i:2. The ftmorfldiosrjrflm reveals the marker bands 
hybridizing '.o nick-: ran:* lated pUCllb The s-zes ci' the bfiiids ar<: 
U;'H *nd M7/.S06 :u»deotKiw«, retipwdveh . 



Fio. 5. Transcript cap site mapping of napm mRNA, An 18 

wt-r oligonucleotide, eomp Winer, tar y u> a r:apin so«uerj*:ft jusr. dowrs- 
«we/ifjii from tht* initiation Condon. wa;« synthesized. This synthetic 
oligonucleotide, y2 P end- labeled and unlabeled in the respective oases, 
was annealed to cither mRNA or M'i3 DNA covering this region on 
the minus strand, hi separate reactions the p firmer waa aih>\*w to bt 
e-iim^aied to the y ftud ot the napin irxnscripts or *o prime a standard 
set ot sequencing reactions. The products wtrre .separated on gradient 
sequencing gel. jUin>' K shows the terminated forms that were elon- 
gated on the ;juKNA. /ww* A, C, C", ari/i T', the rcspeciivt ->co;^^;.tig 
rfc^rrtioris. 



With regard to the prlinar>' tran«i«tioii product, compari- 
sons of all the known sequences have made us awtue o? aii 
inrerestin^ r<>p«:ateri :>tructiire in tht- rt i move<i parus of' the 
napin prti>pepti<ie. AH oi the previously se^iu^riced cDNA 
clones and the two gnomic clones discussed hero oonfonm to 
this structure. Ji cons>ista of a stretch of ' ; or b tuuino acids, 
X -X -^-^V, where X detiotes hydrophobic and - natively 
charged amino r/v.1% rtispectt.v<:iy- Tliese i:Oqut?nces ir : nafhA 
are shown boxed in Fig. o\ The negatively charged amine Mciti 
in brackets is only present in the first copy of the repeat which 
occurs in the iiir-mo-tertninal pari of the precursor w^uence, 
bet'ore the small suhardt The second copy of lh* repeat occurs 
within the Te moved ?e<{uet*cc which is present btrrwe-tm the 
small And large aubuniis. These two repeats in fact carry 
aimost ail oi the negative charges that ore contained in the 
processed pans of -he precursor (Ericsoa vt aL. 19bo>. It is 
possible thiit the?e repeats are involved In proresjies n;le\ant 
tor the transiOCid.ioo, intracelluiaT transport, and/or deposi- 
tion of napii* in 10 protein bodies. Altjematively, they could 
serve a^ sig?ialr in ihe pr(.»teolvt"ic procen^ing itepA necessary 
fc-r the generation of mature napin. However, confirmauoa of 
a possible role o;' these repeat in the above pr(«;es:-:.--i wiH 
hase to await experiments directly ahrjed at these pooras. 

We have noted several interesting features in xha sequence 
(if ruipA (and r-C'iNA} \hiit may be of relevance to different 
aspects of gen*? regtilntion. It is tempting to speouiate that the 
5' hairpin region and the T AO A CAT repent region may be 
directly involved in ihe tranj?cript.it^nnJ activation of t.hc ^ene 
and that the 3' hairpin region may oe involved in the -erroi- 
nation of transcription. I'here in ample precedence m the 
literature for the former point* z>. de^-nerate - or non 'degen- 
erate) repeat* as well as alterations in DNA iopology t possibly 
manilestiny n&eif in cruciform structures) hnve been -rnpiied 
in gene regulation hi several systems 'Cidom -?t uL, Hail 
*?r al. t 1082; Hariand et vU., L))M. Serfhng ef ai, l^o'J. It 
appears more d(»ubtfu; what rolo hairp;n loops may phiy in 
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AAC C * " ' J ^ TT U ATCCGTCATTC Al ' I ' A. I ' M AAAiiAClTAT C T 1 f t rTiITTlT^CTTYTCJUXXAACTATTt , A£TT*CCA£TT^ ATCCTCATT 
iU ^0 JO to so *o >J 60 »• 10* 110 11C 



TTTC C AAC ATTTTAAATITC ACT ATTCGCTGAA1 
114 144 Li* 



ldO 



s AAACAATTCACATCGCAGAAAXCTATC AAC CAATCC ATATAT AC AAATCTAC CTCTTT7JTCTCAAAAC 
1 1 C L DC 110 200 HO 170 230 1*0 



ATCTATCOCATCCTTCC ATT1 
JVO 160 



ITCC AATT NJTZ*L~ ACTTlATAlTXrra*£TCCTCVn A TTACTA I ' r 1 1 CAXGUiACCTTCC CAXUTACATTATATTTCTAACCATTCAC 
270 ISO 290 300 310 12* 3)4 340 DO 340 



OCT ATTCA& L L. I ' l ' I ' 1 IT- 1 T C AA H Ml 1 1 1 ArTTTACATATtX^rr*TCAAAT^T\.TCTn A^At^rf\A^ >T(JUU'UA^TATACGTT^AAl7rGAArCTTX^~ATAC CCTTCTCCAC7AA0CAT 
ITS JSt ))0 410 4*0 420 430 440 4 SO 440 470 4S0 



CAC LT AC CC ATTCTTCACACAAA TCTTACATTT7 AvTATC AiiACT AAAATCTCTAC LTATAACTC AAATTCGATTUCATCT ATCCAITCAAC AXAAAATTAAAC C AOC CTTXACCTCCA 
499 *00 J10 «20 530 544 550 5*0 VC 540 510 600 



TC\ ACATTTCJUU7TATTTTCAAACCCTTCIJLJ.-1 1. tTATCC ACC'J I AAC AACACGOATTC C CAATTTCCAACATTTTC ACTC AAATTVCC AATTTAT ATTCAC C GTCAC? AAA TC AA 
610 620 6)0 640 fc)0 440 670 «>60 690 TOO '10 720 

C » 

CTtTAACTrCTATAATTCTCATTAAtATCCl AATTTATATrccCAACOCCAiTACCTC t AAAATTTA I' AoACTVT\ ATCCCCTJ I I AA^CCXATTTA^TAAACL.i'l'J-l l 11'] f 1JAATTT 
T )0 740 75U 760 770 TBS '10 8*0 810 120 4)0 940 



TATGAACTT AACTTTTT ACCTTOTTTTT AAAAAu AATCCJTTC AT AAUATCC CATCCC AGAACATTAGCTACACtJTTACAC AT ACC ATOC AOCCliCtK^CAAITCTTTrTCTTCtXrC ACTT 
840 460 870 ItO 190 400 910 170 9)0 944 9*0 960 



CTCACTC CCTTCAAAC AC C T AACACCtTv 1 IT. lVtt-ACAGCAC ACAC ATACAATCACATWOTCCATCCATTATT ACAC CTC ATCSCCATOJM**!^ CTTTAT AGCtJ 
970 964 990 1000 1010 1025 10)0 1040 1350 I960 10' 



AACT 
1060 



10)0 

flAHKLFLVSATLArrrLLT 



CATCCOCTTC ACTCTTT ACTC AJO^CCAAAACTC ATCAATAC AAACMC ATTAAAAAC ATACACGAATCCCCAACWWCTT7TTCCTC CTCTCOGCAACTCTOTCCTTCTTCTT'CCTTCTCA 
10*0 1L00 L U10 1170 11)0 1140 1150 114.0 1170 U10 1190 1200 

S I Y * T V I V E F D m ; D P a ] TOSASPrtl I FM CHIterSOA 0 M L * k 

ccATCTAcctxj^a^rcy^^rrTCC^^ 



H A 

CCAATOCLTCCATCTACi 

1210 1130 



.2)9 



1240 



1150 



1260 



1270 



1210 



1)00 



Ul) 



1)20 



Fig. 6. Sequence of the napA 
gene. The figure shows the sequence of 
the 3.3-kilobase HindlU BfilU fragment. 
The symbols used are all described and 
discussed in the text. 



. C ggHLMXOAMOSGCGPSHTLDC 
CrTGCCMXMJTCOCTXXXCKJkCCJUiCC^ 

1JJO 1340 1)50 1)40 IJ70 UBO 1)10 



AiTi'lUUJlTTtJuOlXrCACAT^^ 



1490 



1410 



1420 



14)0 



rLLgQCCNet.HO.CEP 
COCCTCTACTCCACCAgT(^TCTAACCAOCTC£A^Aa^U^UCCC 

14&0 144.0 1470 1440 1490 



vcrTLKGASiAvitgg 1 A 2 J c 9 1 ~ 3 

:CCAJUXTTGA>J>CO»GCAITeAA>CX 

1506 IMS J5J.Q LSJO 1(40 



AC] 
ISSO 



1440 

1340 



Q KQflMVSR I IfOTATMtPlVCMIPQVSVCPPflKTUPJPSY* 
JuUMUAOCAGCAAArq7raGa.T7TA I 1VK CCTTCCAGAACACCATCCCTQOGCCCTCCTACT 

1*70 1640 



1^70 



1SO0 



1390 



1600 



1*10 



1620 



U39 



1*40 



1450 



16*0 



AGAJTCC AAAreAAACCCTCGAgTCT ATCAA i Z 1U.1 l V i ^U ^TAIA TTOAACACCACACCTCATCa .t. Tl. r il T AAT AATATCTAAi ^ ' l ' ll T AJCTACCA J TJTTT li ACCCT AAIUT 
1690 1700 1719 i723 1730 1740 1750 1760 1770 17*0 1790 1100 



AAAATT AS^CACT ACTC CAT 
1110 



CAT A^ AAj/>|ACACACOCTCTT AATCTTT AA TTI A CTCGATC AAICATCTrACTTAAiCTCCTAC 
1119' 1»40 1*50 1UO 1870 lt» 



ATCCOCTTTTAACCCC%A<XrCAAA£ACCGTTCAAACrCCTC 
t»0 LI90 1100 1910 1920 



TCTCA nU 1 TL T C T AfC AAJC ACAIATACATTT*IAACTTO <. 1 '7 T 1 A Tl I T LLV A.MX1ATCCCTCTACAC-AAA C AACTOOC i*K t X. PL CA T ATTOAC CACTACfcAATTTCCAACT AC 
19)0 1940 1970 I960 1974 19t0 1190 2000 2010 2020 2030 1040 

TTytA O.Tl ' U I 1 1 IVJ UlCTArroeJUK^Te^TTTTACATt^ 

JOiO 2060 2070 2010 2090 2100 2110 2120 21J0 2140 2150 2160 

CTAAAATTTACATTAACAAC CC r nt- T r r riM T C CTC l 1. 1 1 - T r^ ' -T T ATCTTCATCTATT C T L r ATCCTrTCGAoCTCSCTACC(XWXATCCTCTACA(^ 

2170 2110 2194 2300 3210 2220 2230 2240 2250 2260 2270 22BO 



CTCTCTCXCTTTTCCCJUkOC AACC AATTTITtTCCATCTC A CC AACTAATACATCTXTATA TACACCTAC ATAAC ACCCAACTAACTTTCTC7CTA CCCATCCACAACAAC CACATCACC 
Ii>0 2300 2)10 2)20 2)30 7)40 2350 2)60 2370 23«0 2390 2400 



CA4XCCCCATTCTCAACAAATTATAn 

2410 2420 2430 



rACAOCTATATATACC ACACAAATCC C U IQlt. lV^ ^TCCAAArCOITACACACAGCT^ACCAATAAncOCTCT&AAT 
2440 245C 2460 2470 24*0 2490 2S00 2510 2520 



TTCACAITTCAOC AAAACCTTACC A AAC CAA l ' I\. K n tJ lAACAAAAAAAACACTCAATnCTT AACATTCA JWW ' l * Ln 1 

2530 2540 25SO 2560 2570 ISiO 2390 2600 2(11 2420 2b30 24*40 

0 

TCA t Trn TTT t I tTATAAA^ iCT AAt .t- T TTl ' 1 \X *ACCT ACCTTATCC ACT ATTTCCTA TC AAATCAATGOGACAACOC ATCCP&AAACCCCAAACOCC ACATC COTATTCGTCGCTCOCAT 

ifeiO 266C 2*70 1440 J4f0 2700 27]0 27JO 2TJO 27«0 2750 iJ6C 

1 1 

exACAA^rroc^Mx^Ta^ i^ri i n.i^ii.n.n.1 1 1 ic e 

2 770 2 7*0 2790 2803 2110 26>20 2830 2«40 265* 2*4.0 2870 24*0 



TTr il ' J * L AAAAC ATTCACTTAATCT AAC ACTXTCTAC AAAAAAATTTTTC AA T AAAT fTAACAT*t ATTTC AAAAAACAAAAOCT AA L 111] 1 ' lU IMgLAATC^OXXATTr.All UTim 
28»0 2900 2110 2920 29)0 21*0 2*50 29«0 25>0 2»»0 219D JOOO 

rrrmni AA C ' i ' i Kj Act aatc aaattga 6 Tc ttg c t c a i i ivvnACTrrcTCAi nTriT. catttattcctaccu ri n a i 1 i ' i atttta i h ' i a i vii i a t~tttattttttttcat 

JO10 J0IO 30)0 )O*0 309* J06* 3070 JOIO 3O10 310O J110 )130 



TC ATTC AAAAAAACCTCT ATAC ATC AApCTHC ATC A TTTCTTTCT AT AAACAAAACT C AAuATC AACTAC C CTCC CCTTCTT AC ATATTAGCTTC CTTTTT ACATTCT ACT AACAAAC C AA 
31J0 U40 31 SO U*0 7 1 70 1 1 *0 319C J200 1210 J323 )2 3 0 32*" 



AArr«TATCACnTCCTrCATACCrrTCArCTCATCCATTCT^ 

)253 32*0 1270 )2«0 3294 



termination of RNA polymerase II transcripts (Bimstiel et 
ai, 1985), although they may be involved in the termination 
of specific sets of genes (Hentachel and Birnstiel, 1981). In 
this context it is worth noting that the napA gene has several 
A/T-rich clusters downstream of the poly(A) addition site. As 
an alternative, these could fulfill a function as terminator 
signals. 

The determination and analysis of the nucleotide sequence 
of the napA gene have revealed features which we suggest 
may be related to gene regulation. Still, an increased under- 



standing of gene regulation in the case of napin will undoubt- 
edly have to await data regarding (a) co -regulated genes (e.g. 
cruciferin (Simon et aL y 1985)), (b) a functional definition of 
the cis sequences by in vitro mutagenesis and transformation 
studies, (c) a definition of transacting factors either by the 
study of regulatory mutants or by studying DNA binding 
proteins, and (d) studies on how the abscissic acid response 
is mediated. The isolation and characterization of the napin 
gene described in this paper facilitate studies aimed at solving 
some of these questions. 
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Fig. 7. Alignment of the napA 
promoter region and the promoter 
region of the pGNA napin gene. The 

nucleotide sequences of the promoter re- 
gions of napA and the pGNA napin gene 
were aligned by use of the ALIGN pro- 
gram (Dayhoff et al, 1979) run with the 
UN matrix, a break penalty of 2 and 100 
random runs. CAC trinucleotides are 
boxed and perfect or degenerate versions 
of the TACACAT repeats are indicated 
by arrows. The TATA box and initiation 
ATG are boxed for reference. The major 
transcription cap site is indicated by an 
arrow. Brackets at the 5' end encompass 
sequences with a tendency to form hair- 
pin loops. 



4< 



pCMA 
Com 



CCTCa|TTAAC 
C C T T A A 

CCTTTTlAAC 



TTTTTCAAAA 
T T I I AAA 
TITTAAAAAli 



CAaCTTACTAAA 



- -ItTTCATTTITTCAAtTTTAA 

Ttl f ATTTT TCAiCTT A A 
;|c I T t T T T I I T r T » » t fTTATCTTCTT - A * 



CTTTITACCTTCT 
CTTTTTACCTT I 
CTTTTTaCCTTCT 



C C T A 



i^* i » * 



» i a ; c c i r c » t i » s a t tji f. a q cccaccaca re* 

ATCCtTCATAAfaATt C A CCCAC A C A 1 aCCTaCAC T 

k » C A 1 T A C t T AjC A 4 C T 



atcgttcaTaaca 



tji CCaTCCCAC_ 



A A 1 A T T A C C A 7 
A C A AT A C C A T 

T A [C A (} A T - A C C A T 



1 1 • 



CCACATCCCt 
CCAC CCC6 
CCACCCCCCC 

it! 



A C C A T T 
A ATT 



? c r f A'j Tp ^ q t t c * a * p a q c 

TCTCACTC CTTCAAaCaCC 



ACaaTTCTTTTTCTTCCC 1T~a~~3 TIC TK » IT CCCTTC* A A f. A 



pCMA 

•**A 



c - C I E A 3 A c E 
CTCTCaCACC 

c r c t K a cj a c 



tctccaTTc: 

Tcrcc IT T 

ICTCCTTTAl 



a a iq a"S a^ ata---*------? ccatccaatatt t a |c a j < t c 

ACaCaCA a a TCCATCCA TATT ACACCTC 

"3 A T A C A A T |C A 4 aTCCCTCCATCCaTTAT T - A t^^j C T C 



C A C C JT A T A A A TI TaCACCCTCCCCTTCaCTTTTTTaC TCAAaC 



c t 
A c c c 



T A I A A A 1 T A TC CCTTCACT IT! 

TaTaAAI f a a C TCATCCCCTTCACTCTTT 



aCTCaaaC 

A C T C A A A C 



TAAAACaCCTTCT 

t a a caccttct 
taa--gaccitct 



ATCCCCATCCaaa 
ATCGCCATCCAAA 
A TCCCCaTCCaaa 



I AAAaCTCaTCaC 

CAAAACTCaICa 

CAAAACTCaTCaa 



pCMA T A C AAA 
C«MK T A C A A A 
AApA T A C A A A 



aCatataC aATSCCCaaCAa 
aacaTTaaaaaCaTaCacga lie cCCaacaacc t 



acaiacacaa |a t q cccaacaacctcttcctcctctccccaact 

CClCTTcC7CCICTt5CCA»CT 
TTCCTCCICTCCCCA»Ct 
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SrcietJie Of * uen* Encoding 1 . ?S Storage Pi otein , Napin , 

at ntjc* npJt 

t>y 

Jolt ( lion , L - C . . Lerw»*n,M.. tr cion . H . I . , And Raax,L. 



ElaHi* 

B i«uu« seeds of a i*haploid v»rtety d( "Svens* Herat" were 
f«n*iOult/ provided Dy Jt ■ -«n» Banqinon, SvaiOv AB , SxrOtn . 
Tins it[i«ii*d i int wit j.ted throughout thase itiidltl 

I»;l< uon of ot*M 

.C,G g quantities ot ecmLitwI. froien leaf tissue -ere 
nomoqen l red aloftq with »olid CO2 m » w*r-nq olencer . when 
*_n« powdar starting CO tn*w, 100 ml at iW TriS-MCl. 

8.0/ iQ mM EOTA/l* SU3 ( sodluff dodecy 1 lu.p/nt*t were added 
and -oe *u*p«n»ion incubated, at 6Q°C lot 20 nun with jentle 
agitation. T"hl» wee ioiiowed ty two iiende extraction* wlfh 
a ? * - i mxture ot cnioralorei: 1 soejay 1 alcohol . The iqutcui 
prut* wii retained ana trie cm* precip itated . The precipitate 
was collected by cent r 1 f uqet Ion , rinaed with 7E\ ethAftol, 
dried agt\ti/ and retttepended in a ntl jf Tl 113 Tris-HCi, 
,j, l «« EDTA I . J»tA degraded uy an incn*»tion (or 30 

minute* At I !°': after Chi addition of IMA** A to a final 
conct-it riuori or 100 u-J/"*- C * J nrlunei o( a loluUon 
^jnta-ning SO Trle-HCl, p*1 7.5/0.4 N CTTA'CSt SDS/1 wj/wl 
prctt .nut k wera T-h«r, added irtd the ni>ur* incuJDAted at 
S0°C far )0 rnLr.\ii<(. "Tie isixture was than a<t*nalv*lv 
di*lysed against TT A(tar dialysis the OrtA was extracted 
cwice with r h lo re tc ret ; i»o*«y la IcohoL *rd then pi*- ipi tstod . 
Tha DMA wii rinsed wits. J0\ ethanol. lightly dr led and 
re*usp*r»ded .r. TE. This ^icctdurt yielded ea*l ly 
restrlclabl* DNA with m an sue soiae 7 0 Kb a* 1»ter»uned 
try iDu voltage e lect rophoi • « 1 » la A lolt agarose 9c 1 . 

S outhern blot^rtq 

por:ioni r>r ripe seed LN* w«r« di9»»t»d to completion 
w.tn iitteier.t r«#ttlcMon «nrym«« and Leaded oft 0.74 jQArc»e 
run with i,n TBt 1 Tr 1 > / Soratt/ LDTA t bvf:«r »yit« 
(Wani^tn et at 1^821. *ic«r '. :ght ita.ninq with ;thi(Hij^ 
OrofOiiJ* * ^• gel wa| jrpwr»»d in « HC1 tor ^ «mutci. 

*rt*t ih* d«p..r matlsr tri« Ona in th« <j«t uai denaturad «nd 
trinmfttir*d r.<s nit tcurt 4. ii< lose IKtiti »• loicti t«d IManiatli 
^f, J 1. Jl82l.rh« »ut»«»c)a«rir tr»atl»«r.t 31 trv« (l.taci «n a 1 «a 
af cctdinq ro (taniatia *L 

Is o t a', ;&n -Jt and Nortriem ^ l ot t lng 

ixHHA w*B HOliletl aa MiCflMdby Etl-»o»i ft allMSAt. 
&»nituring a<]Aro«« t)cl* <«ra prrpared arvd rilft iccorJinq to 
M«niati» «^ «l:i9 a i). I ji| ot <*enatur«d mftMA loadwd on a 

11 ayaroic/ f or»ia Idenyde 3" 1 and tuDjacted to «.«ctropl»oriil». 
T/anCer jf that mRJiA to fu. troce 1 1 ulon flltari and the 
*ijb*equent _:e*trneric =f tha fillet* uil according to ttsrvlard 
pfDcerttiret in*niatl* et a^, :98?) 

t> KH-'.nnil*t ion and rv/to r idlia t ion to £oi,th*rn bl.ot», 
N orthern tlcta ind icrttftin g f l.ttfi 

0 . I -0 V DortLoni o( p^APl cSKA »crt n ick - tr ans 1 a te<l to 

oofjn tadioact lv« ly L*t>«l.«rf proM. PrehyOr idi 1 *t lona and 
nybr il 1 tat ion» w*ie don% with (onu« tdc-cont Ainlnq solution* 
a^;oiJir,g to ttindard piotocola w*ni*ti» et Ai.DBi). 
wa»hlr>9 of (l.*.»(a «as dor.e at hlqh atrin^e^^f . i.e. i Mi 
S.3diU» cltui*- 'CI, pH T.1,'10 mM N*C1/0.'j\ 5DS at b S°C, two 
tiinei I n. .Iter* waire exposed on X-t*y £ » 1"> wi^h 
inttMifYlnq screen* *t -*0°c. 

C jnAt ru^t iar, ifnoni; .linty and tccaeni&a tor napin 

; jor.ei 

a i pelted CH* (IK „<i I nit ■?* rt. ally daijraded wxth t«xil jrder 
c ^nd v t Lar.i tnat pr edomi.r,*nt ly yitlded trag»*nti in *n« nr« 
rar.qa '.S-;b (b. UNA noiecul«« nf thu »vie c^h wata 

fjttri«r pyri.£ia-l by t r*C t lOrvat ion or. S-*0* mcro*« qradi«r.t» 
.r* NaCl that w«r« ryn .'or a h at j»,»00 rpw ;rv a 9*Ct**n 
rotor. The fr»ctior* contALr.lrM) CKA w«r« 

pooled the j«A ^tcipttitad, resuxpemted and phoaplvataie 
traatad to lurcnet 1 educa tf»# USt of :n«eit 
coacat or*«r 1 1* t ion during ligation. after rewo^al of th« 
pnotpt.at**« t>y extraction tr»« DMA was precipitated, pelleted, 
rinsed arid rc»utp«nd*d in TZ tc a C3»ecntrt! ion af 1..2S 
ud/u^. The EflBL3 vector DMA was douDl* Cleaved with AemHI and 
BcoSI and :he ljnK«f-pi»«l t««ov«d by i»oprop*nol 

j,n:;pititLon of the LhA ■ Fri»ch*i.f e^ *_1 . 198)1- :t utl 
cesii*pervcod '.o * concentxa t lun uf 0.4 ug/ui . EXtracta for 
■>'. pi-ia9« lambda in v it rc ware prepare acCOTdinq to 

Cona.unni of iijatlon and packaging ct . awrxla ^act.clas 
wuo wera ini/»i( igatel on a small ccal* ftiot tc prepaiation 
Ot '-r.e .ibrarv- condition* finally choaen for the ilDraty 

conittuctlon were the fo I l(>w t nq : 4 ot vector and ' 2 . *> U<J 
of ;i-i : ; kd inuit cm a f » Uqated and p*ckij«d i_n yL'.io 
aftar dividing into 10 nuri containing eacr.: )i "1 buifer A 
(23 (^1 -ri*-MCl, pH M. 9/4 ^dj/ ft J<i% B -?r»ercapto«Chanol / 1 
nrt EJTTA I i ' - ' ~" ' - - ' " 1 ' - 
pH ! . 1 / e 0 

i**rciptoet*i*no» I . . . . , - - , 

fteai«/*h*w .yuce. Thi.* yielded a total of 2,ixlQ* pia<iM« 

.dad 



.feS C*«A tin 10 nJ. I , J Ui Ql »l« 16 "rli-HC 

N sparwidi ne/ 1 * mrt t^Ci^/li ATP/0.:* ft- 

17 S lL sonlcatlon eirtrict *nd_ 2i ^1 



:cr»vrx» unin iptul. TTie library was «<jbi«3u*n iy uwp.^fiad 
an 10 Uif* acraen-ng plates uf 23x25 -^w . in«e l_;ilC clunej 
;f tr^e 4,npK!ied .ifcrary were »:re^fie<l by a?r*adihj ).;xi0 
p(y un ea.-h iLinnin') plat*. Riplica a.trO«i.i..o*» filteri 
-•ie pre oa cad, ind pretraat»d as deacrlEjed :N«nuf n «i 
*^,i»l2t rondttlon« tot nytet-Lditat ion are given above. 
p*cc»nl)inaric phaoei were pviitled by two ^onaacutlvi 
ii(c:ecningi on ra<iti:*i 10 cm L»/a«iar platei. Growth of 
recombinant pna7*i anfi e>vh It ica t loo of phage CHA J«re 
ieroiJ.mj to eetablished protocols (Maniatis et a^.i^lti! 

fappi ng of qenooic c.cr^> .a-ic s^ac Inning 

L*a\t>da racomflinaiit c.on«» w» ' « (»App*d by coRDinln? the 
proc*djra ol Kacfcwiti jt. *i 1 1*84 < with a set Of complete 
dioestiona with clttvar &ai! a^one 01 witx Sail rogtthe: with 
•itnar 3t six othar restriction enzymes. Southern blots fro" 
4 ala ->n which t*>e lattar 'liqasti.jn* were intlyiw) ware 
r fV*i*d and hyaiUiied wi;h label.. «d pMAtl cOHA. Svibclofiing 
r»f a (r*gr>err containin) tne napA gen* -a* do<i« by 
t-uril Icjlioh of the ttaojieett on *n *garo*e <j« . cast wtti low 
y*ll;nq tm^iituit agarose. The ^rifi»d fragrsert was 
ubc'orad intc; p\iC\ » ! Vanl *ch- Pet Ton «^ *1. 191^1. The 
:,kir»C fin*? wii mapt^«d by rnni/tntionil di g« at i on / rfovible 
tl'.^cition rwchnitjues inaniatis ajt al_. 4 9*Il, 

Nvt; le o t i^e it^mnc im 

hui:'.ao'.;ie leq'.fncing waa oAf'li"«d according to Sanger fii a^ 
■ 1 9 1 1 > rf^th [ -"S 1 a-thio-dATP J« t Me laoelled nucieotlie. 
_ ne 1U jfcrori j fed w«r* "fill and , y*nnch - P«r roci at 

4i_^,.»9i .Dotn f r* »not(juri proL«dtirt »*rki»r and biat r • i , I ) 1 
and directed ii.bc.or,ing into ni J of fragwnts derived cty 
.♦St'ictlon aniyw di«*tr ,.m wtn used. Tha t«^u*»r« data was 
u-vi *d ty iie af 'Si 3* *yare™ 1 S t aoen . 1 4 1 0 , j t adar . 1 s (2 1 . 
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the: la addition rcntuned: Ji J H\*avan placental iJ^Aae 
inhibitor: )4 -m Tris-«C1. pH %.i i"«»a»ur»d at 4J°o 1 J*» **• 
MaCl tad b (•« MgCl.2- annaallriq tor 1 h at root 

teaaperacurt oniAOCllcd HfTFs to a final conCtntrat lan of 20C 
«ach and l.S vinlti Df AJ4V r*v*rse t ranscr ipt**« were 
addad. The aampie was then .nciDAted at 42°C for 20 mifl Ahd 
Sutse-qvient ly '.reated as a reqular sequencing gel iamvIc . 
ApproiUnateiy 1 nl of the mature I iOOO =p«sl wee loaded 
onto tflai gel and rmi alonqeide a reference set oC »e*|^erK ln<| 
roact ions 

Da t 



The three i»a1oi lata Qaaas KBRF . DCBL and GCVBAafK I ware used 
in the ••q\i*nee cosnpar is on* . 



Soutfeefn Aftd («ortn«m Qlattln<j analyses 

aj an .oi.t:al step F owardl defining that complatiity of the 
rapeaeed jenoM with regard tc riapin gene* we dataided to u*« 
p*APl, * CD* A olon« wfcich encoles tapl n (Cricaon at 
•i.UISJ. as a radioactive probe in ioutbern Olot-tlng 
analyse*. !0 u« portion of total rapeseed ONA were in 
separate reactions digested to oo«pletio»» with four difterent 
restriction entymea . Pol loving separation of tn* generated 
DMA fraqeatnts un egAros* gaLs, the IragmatntA were danatured 
end rr*Q«ferr»d to niuoce.lulot* filters. Kybr ldi aet ion to 
the filters of nicA-tr»h*lAted pWAPl c3WA yielded the pattern 
sliuwn in riji^r* 1. The dlflereri. «niyisaia yielded b«tw«»An s 
And li ■ ^yniidltl.ig bands. Since It ia not known to what 
extent the cniyaai may ru: within individual napin j«n«» , 
thaire is no way at jiducing an exact g«n« njnoer 
Henrert he Less, COT SI da r 1.- g the data urn a whole It app««r» 
reaaonebl* to iiium tnait thete are tr the jrder ot 10 garni 
for nap in. now uny of these nyor idi « 1 n-q bands that represent 
expressed .napli gcaaa :e at present not clear. 

the fact that several genes mey be expressing 
defined, rt*)or napin ntfOJA species was evident 
cmDryonal iMA was subjected to Northern 
blotting with the cDHA Drnbe intuit 2). In addition to tn? 
major 850 nucleotides -ranscript, a diffuse, populstion 01 RNA 
Specie* is else evident. This ranges in sir* from ipprcnu- 
tely 430 to lb HQ select ldae , <>nd as a whole .owtiutti 
qvt.t* a significant tract. in a£ the 'otal hytJtlClling raate- 
rlal. Wa cannot at present deterrain* wrtetner these lAxger 
WAS lapceeent * vast po?\.Latlo(i ot differently poly- 
adsnrlAted apeciea of napin transcripts or sueply are con- 
raailnatirKj nnW*A wnicn ha» riot yet been polyadeny laced, in 
light of the fact that &*oly*denylar 1 on appears to be a aucn 
aiore site specific process coepir^d to ttlAC at 
l ranact ipn ona 1 r-rieinat lor tjjlrnstial =t a_i,]<#lSl w« favour 
The latter e»p lanat ion . 



Irrespective of 
napin, one well 
when rapeseed 



Isolation and restriction r*aopine of nap in genomic clones 



:on»ttuct«d with DMA from a 
Screening ~>t I.jkIO' reconibinantt 



A jenostic phag* library was 
dlhapiold line of » . napui 
with the pWAPi cD«A c»one as the probe yielded eight positive 
cloneia. OH A wet pre per ed fro* thee* clones attar they h*d 
been ?s.r 1 f led by two consecutive rescreeni ngs . Happlnd ot the 
genome clones showed th^at four of the positive recaMnrwinti 
were overlapping clones contamine the iui gene, which we 
have designated "apA . rigur* ) diirtlAys the restriction mp 
ot this region, as well a* the individual clones that cover 
that region. A ).] Kb Hindlll ■ Bglll fragment nybrldlaing to 
rhe pMAPl CDWA probe was subclone*! Into plaamid fiUCll 
( Yanish-Ferrcn et al , uei 1 ,a.id furtNer mapped by -onvertt ion*; 
techniques ("lanTatTs et *1, 13*2). Figure 4 shows the reap 
that «ii obtained and" - a COropar ison with the oHAP: ;0MA 
restriction rup. 

It hAA been shown in othar plant g*ne lyitrnt tha rts 

signals involved lr. regulating r ranacr 1 ptioru 1 initiation 
usually *r* contained wlth.n sequence* that *r« legated 
reasortaoly close to -_re transcribed part the gene i Rae.cn 
tl *_±. HM ,-Horelli a^, 1 96 S I . Thus , w* considered it iLkely 
that all the linked ee^uences, involved ;n tcanecript tanAl 
regulation were contained in tnit subclone and consequently 
decided to sequence the whole insert nf t.Me subclone. 

S equencing of tha napA aene 

rt\e entire sequence of the 1.5 tb fragment was Jatemined ia 
overlapping sequence reactions on both strands by a corstiina- 
tion of "shot axn" aequenc.ng and Aaquenclng of lrtdiv idual , 
restricticn ansyia* -derived Ml J subclone*, loth the jniversel 
17-mer sequ<nciig primer And synthetic oligonucleotides 1 *a- 
neiil ccwple*antary to aa-Tj^ncas within tn* *ubc loses were 
used tc obtain the i.i^»plet.« aetrurnce, Ttve secruenclng strategy 
in a schematic fashion ce.ov the restriction 
*. This repraxinta a estimate ot 

ha- w*re coiiected. Sequenoea that ter« well 
the "ehotrjun* -lones, the transcrihawl region 
were deterwined with a iOt hijher fregoe-icy 
-:h« figure. In addition, inany 
per formed nort than once . 



is represented 

map ,n Figure 
sequence data 
represented in 

in particular, 
than it adda rent f rue 
individual reactions were 



*-vc - . a t>e l j e'l with poly-nj/-l»etiit kinase Maniatu »i |_^,m2l 
A^nrosiieateiy 0 2 fno^rt ,1Z.:CC yen of the labeLled 
■^1 . 3o"i :C leot L'le were »ddad to tig of mJHiA in a 13 *1 »i» 



WAPPina zt '-he initlatich site for t rer.scr ipt Ion 

The t ranicription cap-site of nApin -efJiA vas detereLned oy 
mAHA arrestee priiear extension. A synthetic oligonucleotide. 
Coevler**atAry to i«JU«A sequences cose to ' he Initiation A TO. 
was P 3 P| end- labei 1 led . annealed (.3 rJO*A and aubseoeettly 
elongated t>o tha S 1 and of nap*n m*u«A* 07 the incorporation 
of tinlanalled nuclaatidas mediated by Aarv reverse 
trance r 1 ptaaa . Figure '. snow* the elongated and terminated 
primer alongside the sequence reactions obtained by letting 
tha »ar*« a 1 1 jonuc leot ide . unlat«llad in this case, pr.me 
sexruencing taactiona on an HI J snotgun clor.e that covered 
thLe region on the ninn* strand. irtian -napped onto the 
sequence M the napA gene the Mjor initiation eito 1* ar the 
A in position 110 2. The atlnor bands correspond to poeition* 
1391, 1112 and 1113, IT.us, trie rujot site of t rariscr optional 
initiation *prwar« to be lecated )J njclaotldas dowr.Straaia 
Iron a i^vir-v-o wnmh conforr.S tc the conaensu* of a TATA oo« 
see bet low 1 . 

aenerai (-41^141 J t the sequene* 

rioure 6 showa 'ha sequence ot the J21S nucleotides of ^he 
Hindi ; I - fcjHI subclone insert. The tranalated sequence of 
the Codlr.d region i» also shown above the nucleotide sequence 
in one ietttr coo*. That sequenra that IS contained in tha 
pttAPl cC*A clone itricion et a^.l^Sal -S inown -uniri 
brackets, and is absolutely identical to that of 3J£*. tn 
thia ;3nn*t It is worth hating that -ha pHAPl j CHA clone waa 
isolated (un a cCW* library coniuuetcii trm the aaate 
dihJipioid rspeseed Una that was used for these studies. * 
thick arrow indicates the major transcription start «>te. 
This .a preceded By an erwxawad T1TA cor t arming sAKtuanee 
i areat.-inach and C hasibori . i 1 i 1 1. A dorte<j line snows an 
iraperfect - AT "reathnac^ Arm CIva*toon , 1 * • 1 ) which is n! 

_t it at al. functional! located inoevially :,on to the TAT* 
box. Cn the 1' nd* that <-odinn reqicn une poly A addition 
signal iproudf^iot ai.u 9 1 own le* . 11 1 1 1 s found ( under 1 in*d ov 
a solid Hn^t. a dot aU.v« oucleoride U^iO indicatas the 
ACtua. site whe'a the pcly A -ail is added, a* deduced (roe a 
coajp*r»son with 'ho and p¥2 cO*<A Fionas i;rouch *: ±j_, 

1*<)I. Fljuia t also shows A sanond mi at TATA/ PO ly A 
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addltiun •nfMi.» i cfibcmad/ under - Lnad ) at n«<; l*ot Ida* and 
2*11. respectively, we pdicittl/ do not know whether -hie 
part of the se<rue>nce represents en expressed portion at eft* 
?«noM Considering the nu o( 4 hypothetic*! transcript 
and tne relative potLtlou of ATC'e aed termination codons 
within it, we thirOt It 1ft leas .lluiy that thla sequence ia 
lipitiMd tat .eaat 4t cn« protein ieveli. *)»*t*ti, we at* 
present./ Maluq thia point by 4 direct «**air»t:cn. 
o« »n« oopoeite irrtAd -her* «ra several TATA boxes and poly* 
addition aianala. \n A/T-noh ••e^tene* bec»*e*« podtioM 
)07I-)U' -wj L 1 jo -he uppoiiu atr*r>d oocrMrond to i 
c.oae.y spaced poiy* addition s*gneie. une additional polyA 
addition signal occurs at position Hi) (plus strand 
n>«lHiLngl. Tovarda the 1' and 3f the «lnu» ttrand three 
TAT*, boxes occur at ''SI *ne 71) (plus a trend nuaMiUfl . 

Che two ritat o; wriicn arc (Mil of a 14 t>p dliect repeat . 
S.ightly further acwnatreeai on tin n in via strand «ca t»» 
addle. enal polyA addition iignal* 306 and ]■•: plus strand 
nuatxi irvg i . Although -*e MC.ouily doubt whtthai any of the 
tbovi taquttcci coni*.itut< functional Hfltwli. we can at 
present toe strictly rul* it out. 

Hiirpln« t r«p«tn ar>d palindrome* 

rh« kjci JUect aitd inverted lepeata of the taquanca arc 
LMicitad cry arrows, pairwiae connactad a* indicated Dy the 
lettering. Solid arrows indicate perfect, dotted axrowe 
latparfeet repeats, ma t repeat he* beam observed and 
diseased pravioualy Itrlcaon et al,l»IH. In addition to tha> 
repeats shown in figure «, the ceqion )01«-)ll7 has several 
overlapping direct repeats, a two - ne Adam arrow indicates « 
(, ignt ly la^arfact palindromic laquanct oraaart mm ilO 
nucitatilaa jpicrtM of the TATA boa. Two different reglone, 
nuc i*oui*i B2*-ee* and ;iJ*-;:oe. display (tituiia *#njah 
appears to rasult In quit* a strong tendency to fut» rtair?in* 
land possibly cruciform structures), several other rations 
ia tfc* sequence exhibit ion tendency to farm hairpin*. 
homvi f , thai unigwa Ittur* of thai region* that «c« dl»«i**a)d 
n«t« i» tut wttnin a r«trwr «t>oct »ttatch o( nuclaot ld« 
(fcO-JO) aav«rai dlifajr*nt fiairpin itructur«« c*n £>■ 
9«ir.*r*«*d aiaiply by alldirtq tfm Kalrpin ar«« calatlvai to aach 
oth«r. Tha iMvt aaiquanca* in to form 4 and t dtff«rair»t 

hairpin atnacturaa , i * a pa»c 1 1 valy , aith bat«*an 9 a«rt 15 t»*«a 
pun m tn« it<M data rujt abewn t . *<lth r»qaid to tha> 
fonaai of tn«s« 44Mru«nc«a t cor r*i pond u%g region in zhm pxatA 
napin <ft*m (S.Soorialdj pcraoMl c ia a unlcation t i*. a.tftjOuoH 
tha »*«>«rvca dlrfara a e — wa.at ( roia a»P» . *Kla to form S 
aitfarent altainatlva iuikuih with b*tv««fi 1 and 13 
pa ics :n cm »ta*a. in addition, tb*«« ration a in tvajM and Ln 
tMa pGHA giDi are both very A/T-rich. Thia «trafvjth«na thai 
•uwaatioa tn«t thai «ax7uainc*c My be involved in a local 
perturbation o( the QMA structure The aequenoee lavoLv«d are 
•nown vltAin DrtcMti in flvuia 1. both of the wovencea ij» 
the n*pA lane are ratner auo^eitlvely placed, one (pontic™ 
A}4-<I9) upttrtaa and the other Ipoaittort* il3»-JI0H 
dowTMtrata tcoa trta tr*ne>criba>d pert of the 7e«va . Mo doubt . 
trie tjuaaxbillty to form aaverel altarnetlve helrpina could be 
of laHwrtance ln tteoillilnq 4 no« a -Da* a luutiuit, 
peiticularly if the regione are under negative auperhellcei 

• treaa : Pimuchl »l jI.HIJI. Mo»^*er, :t he a been arTMed 
that tha kinetic* of cruciform toreetloa away raatrlct the 
inportaiica of aucti teactiOfia lg vivo (Couxay and Wao»,lMJ). 
Tha 4u«*tlon whether trenecrlpclonel acclvauon of «uA\axyt>tlc 
chroutin i* *t ail influenced by toraioeel atrea* -n vivo !• 

• l*o cofttroireraia 1 . Oera (evourlnfl both opinio** navai been 
put forward {Harland a£ l»tl , Slnden e^ a_i,19»0). 

Another intarestln^ feeiure of tne proetotar reeion le tftet 
wi-Mri a sequence J0-i»C baae pairs ufietraaM ol tt*e TATA bos 
the trinucleotide CAC ia repeated 11 t La>ea . Most copiea of 
the exc trinucleotide occur aa p*rt of a tendaMly repeated, 
d*7«nerata [»«pta*er, which In turn ia repeated 12 tlaa>e. 
rigure 7 ariowe the region of intareat with the eequenoe* of 
pa, oA and the pOIA nepiri ijene aligned to «ejii*U»a the 
homologies. The CAC trinucleotide* and the deowherate 
repeats are indicated in the (iijure. Trie coneeneua eatquaw ce 
of the repeat oorvaider loo all the different copies is 
TACACAr. The TATA boa end initiation AT"0 are intniad for 
reference. The aa]or cap-alte is also indicated by aa ax row. 



co— >4rHop vim <*ftef ^tclewtlde MKiue^oes 

A awaxca with the ciajA 3' region aequeooaj aganaeic tha three 
■ajor data ^ni aa we 1 1 a* egalnat raoently pufcll*6*d laAd 
not fat entered I eequenceei of ion star age protein gene a froaj 
ether apeeiea failed te ravMl aay feature* that we- oould 
tentatively IdetatXfr ee being related to geae revwletlon. we 
were also unaaLe to find sexjueejoea in ti acA related to sv«0 
anhaaoar core *eo>*Bce« (Waiter tl tX- lWTl vuiieaa allowing 
for ) or mor» %iau 
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riaure J : Restriction svap of tne oenaeic region cont 
the. r.afi* gene. Individual lastbd* rei'iTli inant c Lone* wert 
raapped as daacrlbed in Materials and Hetnoda. The figure 
>ho*Bj that up of tlwa geoomio region and tne parts contained 
in different racaabitanLi . The awaeurlng bar coi i aaponoa >.o S 
Kb at DMA. Th* tniyaei Mse>d wer« S»aasMlj C*S*cii) E-CCoaU; 
G>»qlii; ►♦■Hindi II; s-sall and NBtlruI. The hatcned area 
lndleatea rht part that hybrtdiied to pat API . 
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rtgvire 4 i Restriction sap of ra pA and ae^uencinej atrscegy. 
The y . i Kb Kindlll - frglll subclone in pJC19 was imapped witn 
conveatlonai techniques, rue figure show* the taap obtained 
for tha insert and how it cone* red to the aep previously 
obtained for pMAfl cDMLA. Meaaurinq bar corraapondi to 1 kb of 
CWA. The eaayMat* used werei A-SacXi C-SacIIj 0-6gln, 
tf-nlndili; r-Apal , T-fatI; K»ttoI and T*Na«I. ftelcei the map 
la a acneevatic representation of the sequencing strategy, as 
discussed it th* text. X denotes reactions prused try the 
universal 17-ejer prlsier on either shotgun alone* or 
restriction enayme derived Mil auAclone*. a denotea 
raaction* px Laerl try ayntnetlc it-teer prinsrt within different 
subclones . 
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We have begun the molecular characterization of 
genes encoding napin, the 1.7 S embryo-specific stor- 
age protein of Brassica napus. Genomic Southern blot 
analysis indicates that napin is encoded by a multigene 
family comprised of a minimum of 16 genes. Two DNA 
fragments containing single napin genes have been 
recovered from B. napus genomic libraries. We have 
determined the nucleotide sequence of one member of 
the napin gene family, gNa. The gene has a simple 
structure lacking introns and containing the canonical 
features expected for genes transcribed by RNA po- 
lymerase II. The site of the initiation of transcription 
was determined to be 37 base pairs upstream of the 
initiation codon by Si and primer extension analyses. 
A gene-specific hybridization probe from the 3' non- 
translated portion of gNa was used to demonstrate 
transcription of gNa. 



As the sequences of seed proteins from different plants 
become known, homologies between proteins with drastically 
different properties are being detected For example, several 
of the diverse 2 S proteins found in seeds have been shown to 
share sequence homology: the methionine-rich Brazil nut 
storage protein, 1 the allergenic storage protein in castor bean 
endosperm {Sharief and Li, 1982), the very basic 1.7 S storage 
protein in rapeseed embryos (Crouch et ai, 1983), and a 
trypsin inhibitor from barley (Odani et ai. t 1983). Also, these 
proteins are related to the prolamin storage proteins such as 
7-secalin from rye (Kreis et al. f 1985) and a-giiadin from 
wheat (Kasadara et ai., 1984), even though the prolamins are 
much larger and are hydrophobic rather than hydrophilic. In 
many cases, the properties of the specific proteins are the 
result of repeated sequences that differ between them (Hig- 
gins, 1984). Despite the different physical properties conferred 
by these repeats, all of the proteins accumulate to high levels 
during seed development, are stored during the period of 
developmental arrest separating embryogeny from germina- 
tion, and are then degraded during seedling growth. Thus, the 
basic pattern of temporal expression has been retained. This 
class of storage proteins is particularly important for animal 
nutrition, since they usually have higher levels of the sulfur- 



* This work was supported in pan by National Science Foundation 
Grant PCM -S3- 16403 (to M. L. C). The cosU of publication of this 
article were defrayed in part by the payment of page charges. This 
article must therefore be hereby marked "advertisement" in accord- 
ance with 18 U.S.C. Section 1734 solely to indicate this fact. 

The nucleotide seyuence(s) reported in this paper has been submitted 
to the Gen Bank™ / EMBL Data Bank with accession numberfs) 
J 02782. 

X Recipient of a Floyd Memorial Fellowghip. Present address: Dept. 
of MolecuLar Genetics, Plant Breeding Institute, Cambridge CB2 
2LQ, Great Britain. 

1 S. Sun, personal communications. 



containing amino acids than the other abundant seed proteins 
(Youie and Huang, 1981). 

We have been studying the expression of the genes for the 
1.7 S storage proteins from Brassica napus L. (rapeseed), the 
napins. Using a cloned cDNA probe from one of the napin 
family members, transcripts can first be detected early in 
embryo development, just after the major tissue systems have 
been delineated (Crouch et aL, 1985). Levels of napin mRNA 
increase until they constitute about 8% of the total mRNA at 
the end of cell division, 2 stay high for 15 days, and then 
decrease to barely detectable levels in dry seeds. Napin tran- 
scripts cannot be detected at any other time in development. 
However, this pattern of expression reflects the average of 
several napin genes. In order to study regulation of napin 
gene expression in detail, it is necessary to analyze family 
members individually. 

In this paper, we begin an analysis of the napin gene family 
by determining the minimum number of napin genes and by 
cloning and sequencing one member of the family. From Si 
protection and primer extension experiments, we have deter- 
mined where in the sequence transcription begins and that 
this family member is expressed. 

MATERIALS AND METHODS 3 
RESULTS 

Napin Gene Family — It is clear from genomic Southern 
blots that napin is encoded by a family of genes. At least 14 
fragments, ranging from 2 to 23 kb 4 in size, hybridize with 
different intensities to a napin cDNA probe pNl when ge- 
nomic DNA is restricted with £coRI (Fig. LA). £coRI does 
not cleave within any cloned napin sequence. The hybridiza- 
tion pattern observed is the same whether the probed DNA 
is made from a single plant or from a population, indicating 
that this pattern is not due to population polymorphism (data 
not shown). The hybridization pattern is also unchanged 
when probes representing the 5' and 3' halves of the pNl 
coding sequence are tested, indicating that all the bands are 
due to homology with the napin coding sequence and not a 
repeated sequence in one portion of the cDNA clone pNl 
(data not shown). 

Fig. IB is a genomic reconstruction experiment. The ge- 
nomic clone XBnNa, described later, was digested with £coRI, 

2 A. J. DeLiale and M, L. Crouch, unpublished data. 

3 Portions of this paper (including "Materials and Methods'* and 
Figs. 2 and 3) are presented in miniprint at the end of this paper. 
Miniprint is easily read with the aid of a standard magnifying glass. 
Full size photocopies are available from the Journal of Biological 
Chemistry, 9650 Rockville Pike, Betheada, MD 20814. Request Doc- 
ument No. 86M4334. cite the authors, and include a check or money 
order for $3.20 per set of photocopies. Full size photocopies are also 
included in the microfilm edition of the Journal that is available from 
Waverly Press. 

4 The abbreviations used are: kb, kilobase(a); bp, base pair(s). 
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Genomic 
Equivalents 

i 3 ft 10 

Kb 

23.3 ■■^■^■■^'^'^ 

9.4 
6.6 

4.4 £ 

■ > ' ■ 

2,2 . 

I. .4, £t*r.:>fi:U* Southern h'of. of an kVroRJ digest of ri. rui^iiH 
ON A probed «'ir. i oick-t.omsiawi }>NI And washed at 7\ -22 *0. 
/>;£/: havo l>e€« placed b> siagto copy signals; J? dc:>i£naces :ii>mnis 
wi^h mienr-isy :>>rr«T>ond;n^ t-j two copies .6% genomic reconstruct 
• .on: Law 1. 10 of 8 nc&:\ HNA di#e~ved wi>h AV:.»R1: £<m#$ 2-5, 
,\BnNa !>.\'A d^eM.ed with &VoiU and ioadvd to surniiial* i. X and 
10 lutp:o»d g?;noimc equivnlcnt.<> based o;i l,l> j>£/'hapicid ;:apUA 
genome 'Venna una Ret?, The filter was pr»f?*rd with r.ick- 

translated pN: 




and dilutions representing 1, 3, 5, and 10 copies/hapioid 
genome were elect rophoresed besade EcvRl -digested i>enornio 
DNA. Wo conclude that th*» fragments which have the least 
in^nse signals contain single naptn genc>, and the ^Iron^er 
finals represent two or more genes. By this analysis there 
art* at lea^t 16 napin genes/hapicid genome. The more intense 
signals resnit either from fragments of similar six* 1 that; con - 
rain ingle £tmes or linkage of two or more napin genes on an 
£co RI fragment- 

Isolation of Genomic Napin Clones — A genomic library was 
constructed in the a vector KMBL-i from B. napux UNA 
digested partially with ScruSA. Two unique napin genomic 
clones, designated .\BnNn and XBnNh, were isoiaurd when 4 

10 s recombinant phage were screened by plaque hybndiasa- 
iion with a nick -irar-sla ted pKl napin cDNA prohe (Crouch 

The napin genomic clones wer*; analyzed by re At riorum 
nuclease mapping and Southern blot hybridizations. E«ch 
phage contains just one napu; gene, and >*nly " he napjn <ren£ 
region hybridizes to cDNA made front enibryo KNA, indicat- 
ing ^hat no other abundant embryo transcripts are entroded 
by the cloned DNA (data not shown)- Comparison of restric- 
tion n.ap*' derived for genomic napin subclone* with ihose of 
1 lie cDNA clones pNl and pN2 shows that: ihese genes do not 
encode the messages represented by the cDNA doner. (,Fi£- 
21. XBnNa was chosen for more thorough examination. 

Nucleotide S'sqiwice of Napin time—The 3.3- kb Eo?Rl 
fraj>nnint containing the XBnNa napin gene was subcloned in 
pUOH (Vie In- and Messing, 1J*#2) and designated pgNa. One 
kh to the right of the f;rst ^-'coill site, as dr;)wn in Fisj. '<, has 



r.CTCATT^STTTTr.ATT?TTT5Aftr>TTTftfi6TTTnaCCTTCTTTTTT^S*TAT*8TTMr«6AI^ 
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9? 



193 



4. . I 



Ku; -V NfiHe'K/ide s**t|ii*>rtc** »ri*l 
cie dotted amino acid sf«ju«?mv: ol ? 
gNa. The ^".-.it'd .wrjHtiw.i lahrtrri 
; irt ;:Uernali;ii; i;urine-ijyrini:d:r.e ettv 
i,:.^:U">; T hp or longer Alsu t.t*>eied ttrf?: 
ilAA i ^fjiifctiHtt ;%t IS^i; tae f'A't A. h<>»\ 
it. the pri.im*r ftxr.ensii'-'i tit>iprK-d f ayj 
s:U : ;jt I'ti/il; the- inh;<;*ir* AT(j rit 
and three stf-juericcs ^ th homc-i\)}<y 
i,.> (J.ve \'oiu-*.-;wtj.^ poly^.df: ft V drU pra- 
; cr>^irjfc seout'tu'e ;jt. v54, ';V^, .jja! :CK)4 



&CW*e«5CCC5nK755Tft3T55TCCW5CT66ACrCTC5ACeSTSftSTtT 
•■'■;». t — ♦ • - — - »- , x ^ » «. f — 



1% 



liSl 



1:-K:04 



Nap in Nucleotide $• que nee 



h**>K vequer.oed by th** method of Ma* air* and (iiRvrri iMa?ca.m 
and Gilbert PWo This .^queues is comprised of >01 coding 
'Sn2 V and 17^ V Sankm^ bp (Kiy, 5) 
The ni-.p'x". ^ti in-- fr^roo ; :> thr or-ly open re.r.itng frame of 
^Mlvcam length s r. eitbi>t strand. The enci Tin i:> :x;quence 
i., \._»r.' A 1 rich Hi'V 1 and ^ marked by many blocks of 4-o 
-D-t- A o* 1 residues. A TATA wja cieseK matching 
the et nsni-us i> iound 70 np upit:ream from the ATti codon 
inir!^t;.i:(: tie "iijp-n pr^'iif^or This is the 11 ru: ATG codon 
:i-U-ea t; of tho TATA sequence. Ferry -two bp upstrfi.nvj 
■ ; ihtr TATA \>.\ the ^rouence CAAT (paction lSt\ Fig, 
; ». Th;-.-i<t ;n '^v -xoectt-d position, this seqoerto^ shows only 
^ op <»i hotv-olo^v to the '* bp consensus eleven: sdiosvn io be 
irtip<;.u»i;t: f'r.ojfnv promoter recoyriitiori iBenoi.st <£ -it'., 
PJ.VH Tht"* 4 *- ^'-■^ r;f i:l:-frn*(t.irj? purtne-p> ri;r idlnt: rt-si- 
otc.nr u^trt <m-; oi t h:» 1 ATA box: h«tw,-}(?a positions &t 
in-: ( ).; *hr- " " Up t^-rnntinc; purm:>pynrrnd;.ne t;n;Ts v 
j: |,i;-^»r;*>( 18' a block 'f ' 1 ^'O.s^cr:!,: pu ") ne-pyrirrud^nf' 
residue*, ''acurv, j»nd ui ixsMiion 1 • an S-bp oi it ;^ found. 

'lie- i: n t r.n • sia* cd >-e^i<>fi t.s high in AT ion-sot { s 67*£>. 
P.n: <<';i"- ir*:'i<i*-r.i;> arf found to contain multiple 
y'iotn.'tv ix-.»ca:J/i:nj: the consensus element associated with 
poNade:;/ .auon :i ruRNA iFiU^erald andSherk. l^U, and 
thr^e <>" ;.tt^- cij-fneoth i c« present ;n tho a&A sequea-,;*;. 
*vc;<r"n:*: r n K irot .o!os ' >o; 9~:f c«nd U*M rKt^. tt. 

»Vmp;ir >or, o* the ft-dinr, >e<ru*noes of^Nii and r.het:DKA 
cio:««?« r;N I ;i;iu ;?K5 tr.di<„.ucii that th<fM no mtrori^ and 
t;;at .d* t:ir^»r t^.ini'j .r-equv Licet; trrauuae with :i -in^ie TAG 
U'don. "A'i hai rM-iuii; ie^uericf* tht-rf i=i soin*? divyr^enct 
h.-rwri'-n the frnc-rru; and »J!>!NA clones- F.;r ••xaiTxpie, wh:jn 
rht »J\:i ifn:>jo»)C<- N i> -ih.^nrd for :'na?chr)um. *3C>mclony wi?.h the 



_ t :**:uer x :;iun U:v *,DT<A. Kkciuuiit^ .!iser;ion:s, the tw-i 
^;c»« : ..-ni ^ arc K> *,'^v J-^rioi- 'x'OU:i th^ ra;c{-j-:;Xid£ I*; veil, with 
- "t rn- i' c: l' -n;dt: -> ■;::.-:tii;iii;o« occurri^^ in \h« third 
H'n? ^:';dop. Ai :^ri ot the ^Na- ana pN2-deduocd 
pepi:de >-f*:u-?r!C^^ ^ho^s IS ^nv:noacicit: Mave b?eri nubitituied 
t \c:M'l;:^ he ^Na u s<*n ^ ;n s but that >»niy ;iw oi the sub.sti 
tv.ii'.n^ <i:r : < c n>or;ai ivc ihydrirohob^* to hydrophobic, iur 

li.Xf.-fi n J .';A'u — Deisj^nsutUiniJ the ^pr*?s«km c ; t a 
parMt iiiur p ut' ;ai nly rne:i>ber by hybriviizatior r»j^uirii:> ;i 
^(-i ■ sp<»ir;f*( pnil'x* St r.ct- " Uu ru;ni ransiii- ed poriifris ;>i ' ^*nes 
"ftf-n provide r. pr< bev H'je u4-kb AWd-^a/nHI rrngarenf 
^t pN'a Ci\n\p;ern<*i'»r..»ry *o x \e V t'ontranelfited porticn ^ r Na 
■ r^iijs.Tipu r: tv'k'.raasiatJtr 1 Ai\d used to prol^ dapHr.it> 
^-norr>^ ^<>u;nern bi-i> <>t tk'.o-il di^e.st^d h tiapus OKA 
K4: ". l J and ,>». This pr<»be aybrtdi?e$ to just tw<- 

'l \\H\. i:t 7V — b r 'C TViT, 54, J: >Vfid ^pccifical-y t-t: 

!.,. xfi ^'Nt. >>Ki iru^n.-eav a t TV. *^ *C (Fti;, •>A ? ^j/ii.' >}. 
Dup;K die :>;■ - i si^e-t f;ic i;onatt*d turpus ertibryo KNA 
•A^rc ;p;briai7x^ . ond ^< f sbed ;n paralif?! wjth tnc UNA Hirers 
f n^-.-r thf -'ndiuurvs r.n.u ^rive ^af:-?.pi-t.'ific O.VA/DNA hy- 
^ridjr.r.tivii. s -t-naj deTe*':aoie on the Northern bloc cor- 
rt**poi.cirii; to a r.jt'rui-Mzed transcript iFi^. t'anc Su 
nvi»ri«ii/..»t!oi rvidt^nt wh^n ih-? I5NA and KNA btot^ w«r^ 
^■j^ht'd >ji - :*, *(* howt'vor (di*ta no: >hour.i; 

M^cui'iyi iht- Ti'miitu:^ of $Na Transcript — Our fir^* 
^t:Jd:e^ oi" ui<r uatuuc^ i: site ot" iraiisoripLjt employed Si 
fMch-.i ^ dt^Msdna u).d\ Ti^ (Pig. ti). r ('he O.W-kb ^ad-AooRl 
fra^ou r.t t.f afsd vv ;/ er )( ; iabcied at tnc Sail sure, and tne 
1* h^li-d itr.utd purit.ed ->n ^ poly .<ji*r\' In wide gel to uae a> 
* probe. The ^liTiie iVa^neiiv wa? sequenced u> provide acru* 
r<it»* e <^ 1 n.'')h-:»rotic .->)?.o >t.u-dards. AaquoU oi'thih probe w ?r » 
hy» Tidi^td .it: ntht-r T„ ~C5 5 C or T. ( -4 with UK; aj< of ft. 
»i*-< t *:;^ <nnhryo RKA. Aftor di^e.^tion of :he result mg hybrids 
rhe long^(?c protected probe fragment waft bp, indicating 



EcoR1 
pgNa < 



Sail 



j 



BamHI 

1 



^7A 



TAG 

Probe »» 



0.4kb 



B 



fccoRi Digested 
ONA 



Embryo RNA 




■:■.■/■ 




" predion <>f jjNa. A, /Wv^ /-^ i^^nf *< >il gftn>V'mic 
Sri.Hu ru h;.»t^. with 1.0 f>f? ^"fryKT-d:^^:*^ o -j^.juc; ONA. 

/'v*;::* ' u 11 pr» n<-4 with a;ck-tr:ms;au-d pNl ard a,- r >i - r: /,t m^ferzu- 
•.itnnccnts, 'C in 0.1 x SSC. condition-; xnut allow nvbnd:23T':':«n 
u> '-if er"irt f>-,,r(n fary>;.ly. L-cncx 2 x.nS. S «h*.w the 4er.e-->r>*;cir:'; 
ri>h.-)iJ <aU'>n >t ;}»*- fjick-trfirwi^-eit A>r;>i-/vr':H ! irxsiutiiK 

♦iibcl om'd rr; »vr* rvNa. 2 was washed at U: C a tr 4 J a; 60 ' 'C 
: n 0 : ^ SSt 'iuptkate Northern bioii'. with <tt u>tat 

".a},*,^ -*j,brv<. KNA probed and washed jy* in ^-jn- i A. la?:?s 2 And < 
Untie r tht* s«ntc cunihtion ih«i giv^ sr^cif^" ^fiie-^t^t tff*. hybfitliza- 
t;ou it; r-<twi .1.3 napic siee tranj»c.ripc is <w»cc:*vi n; ^t/iW }'<. 

nifMtion it f.ih "; v nx-mber in Fi.?, 4 AKo .lo^.arenr aro 
stream .signal.- t:',rr(i.-spondiii^ to cltwivajro in ih? j :>:-xks of cLA 
residues io» atf- ( i 4 and 10 bp downstream from \Uv initiation 
sue. but tht- it-;i^.n Tor cleavage at these sites is i;nc!«c*r, lxx"<il 
dctiiiturauon m the AT-rich regions .seeni-i imlik«-lv .ls these 
si^'indt* are ^eji-_*rjtt i >:i under nonsirin^ent S: n jclf-us*? dises- 
tiot: cond;t:oi It h po*sible that these s^p.ai? repre sef-t 
other :n; nation points for the saiae gene or di Iterant end 
juncture* «>i f ran^cnpto irom other napu: ^enes ^hich are 
abir U' hvbrttLi'.e with the pxoije. which <i*)C$ ;f.'nf.xi.n bp of 
co'hng s(»'|uenco. 

\'7im'>r ►*xt*'r.<don analysis employing « -vnihetic oii^oriu- 
clootide priiner without coding sequence \vo:s in.iertiikerx to 
rri'">r? sp*eifi;:d> uetlne ihe 5' t ; nd oi the ^N<- tr; -nscrip' ^Fi ( -;. 
7). An oligomer wj.s syntho^i^ed that. wa« rompif nientarv to 
the *S h;i.-05t imrnediately 5' to the ^Ka coctoii. hybrid- 
jzed t4i emlji^'O RNA this primed the r? verse trariM notion of 
a product extending 22 ba»e^ beyon<: tht- oligomer indicating 
KKA initiation iit ihe dA nurntMin;d 251* in Fig. 4, ju»t one 
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Xhot 
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S1 Prabe 



32 



O.aSkb 



; 2 1 4 5 6 7 



TAG 




B«mH1 
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-3' 
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V- * 




ST: 



I: 



/ ; . ..R:-.Sr.'/i > "-<.>bf ;'r3g;.m»nr v.^ : -, k«;iased at the .>'<:<* I minus : md 
s:r&mi sfpar.'rod. <m;w i, probe f-a^nu nt or:ly tan? 2, prohv di^U^.i 
wuh ■'<0(;Mi:its/rr,l ^ri nvtolc 1 -*^- .X probf fr^rcem hybridtwd to 

I v H_ : ..it v. ^ii*4f. r.av.i; RN'A ard diyewt.t-J tvdh >f0 (i»it;s/»j.l Si 

v.-aAf' ^.c ■"■<< ■ iiturds .->«;*ilor U»;;.n '.). .% kb '-v-.ve sc<;rt. l*i i^no/- V the 
p;<:.r,c ria>. hf-'--i hybridired with : X'J ;-f /X flaws roia; embryo KNA. 
.0(;:t- i. byhr itLzativn w;*^ perlorme^ 3f T , ■• ~' C iwuwed by di^*- 
ti>n wGh '»(-G unit- .''nil >jI m:eie.ts*- /. .7/u> .'> hyhrifh/^irw; wns p^r- 
;:.>nu<-d M - ?.'*> °G arid, iiirn di^rM *!t.h :<*' •i«..»f» y 'rfiJ S: r.ucie.o^. 
L>f,v: f>, hvnridv/afion pcrOimci 7^ -■<"(' ard then dieci-ted 
w.ib aOO urits/rul Si. :- ; <icic3.v, Laie the hybridization wtfri per- 
•crrovd it 7\, -8 ''0 followed by digasdoi* w,*h ! (X> v:nu.^/m) -Si 
n>;».'jf!v»f:? Tri^ o .-S-kb pr.>he l :<> jj>*t r.vt was s»-< *ji:ncf;'i to provide siv.e 
f»r-ir.dMii.v Imca- AG r^:v:: ■;.:>«. G~;j« .9. TG r^icr^. n. > r .*jw 10, G 

b.:-«: short of 1 5 -nil mapped by Si nu^e^e p to taction. 
Sit:Ci] er.t'.ir/ora' mRNAs ;:re j\:p^y-id at }>; : rine residues (Cory 
and Adaiv)H > l^T < f ,] i wf> <?x^»?c: rhc ziuth'^Mi 1 HNA Lt-itiauon 
y\\r t»f -Na f.rar»:scripis to h*? ihe ciA. }v-s;.T.u>n iYlg. 4).. 
iniUraH'ti ity prtnttr ewer.- inn analysis. Tl.e ^Na transcript 
r.hus ha>» -i 5' nc?ntr;ir.;-: Kited leader -T/ nuvk*«;tide-s loit^. 

The n^mer oxtcn^Uin ^^pcim^nts wor-..* a;^; i ised to addre^d 
Lh»r expression o.t ^Na hy rx f}(>r Vj ,i n - tin^ 'ev-iTSt: t r;in:><;rij>- 
t.i -:.-r-i in the p-e^nce of di^e^NSTihoiiur;^ tides r;> dt-rerrnirK? 
tiu': .scqufn*.::? of fh-i j:rirri'jr e.viension pr'Xiuct (Fi£. 7). A 
>t;ii;enc<? <-or»sisv^nt wi-h U^e express:! n of ^.a can be de- 
i^cied. ai'lxjvi^h he extent i*: which this- pc<rri:-ii of : die gNa 
■>ecuence cotisorvcd rtn\t>a^ tht; isipin £ene$ is not. yet 
known. The presence of iiet-tsro^ueous siganls in 'h? scquenc- 



^^^^^^^^ 




F <. . Primor t i \tt.'ns!Oii ar : *ilvs;s of >?No. The a* — - r ;de 
$t*-4i:^r.cc jpr-trt- ani of : h».' n.*pi:* :i;im;;'.:^ «• «d*.:a i.- ) v/r. at *rtt* :»:■' 
rUf M^'^re A i ^- r < .»r (>.*■ nt* r/ »ry : t ot** / V' -*-c *<-rt»' *- *> vr; 'At*: 

j:»ili-V. •>(. .*"»di' r ( /, i\ .r.::-»'ti ' J'l <: fi ; »f iU t() 0-i. J; .; ,>t :,.>[■! » :(tV:fVv> 

KNA ilvr-ncU *fro exttmi^'i t:\ t v,i:. f^sf ;;>i toi • v;v::> t':vr:rsc 

w ; rl: t o KN'A ( ■ r>*'f»^ r f ( * rryion >:.-t?.:; i«»-f(.i i ;»? ;>f irrifrr t 
lyt Uun -» / I *:>u n -V 1 •:')*, ■* Ou1r:^ v r il < (i-; v ■ieotidfe 

ftvj)trr-< i ! '.'^l^u^xjr ;>">j)h f l/ir. r i, % 'eaci:'«r.. u '<. A 
rr t ".i;oa ■ 6 T r-.*^*. ( .:on t-^'K 7. * j- Ctcn /*>.'-- f\.%r f>.-t?f: 

j» !.><>*- 1 ir: *ue -f^.*-'n ins,' :.r.J>i**>' A^er* t-.*ti»".- M:-J >"'::r t 
Ti'if ~i ft}*" - ^ i "'npUf T' .n ' ntin-n: 

tr^' .iddyr :nd:i ;*t<':-i '.ha* irv p'irr*" h>I ndi7*:d wn c-iticr 
r.apm f r u • ript : m i> 

0:' - a;)pri'Xi'na*»-{y Ih r.a;;:a r i e ^ .n // napt.'i, or.e 
r..u- V>, n* l; a- ru til hy us. ^Nj, ,ir*: ,m"!ner, napA, by L.-O. 
Jo^pt--. :jd)' i* '^'^ p r *\ i ■ jU;mv rcj.-.;rtt.d i he :-t ^of-ncf s 

■[..■( i .»i>»-,^w in u -d < criiiiiuUiLdUou. 
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Napin Nucleotide Sequence 



n*p A 
gNa 



napA 
gNa 



111 



napA 
gNa 

pN2 



221 



RY 



T6AA6.TTAABrTTTTTACCTT6TTTTTAAAAA6AATC6TTCATAA6ATBCCAT6CCA8AACATTA6CrACACS 
T6AAGTTTAA6TTrTTACCTTCTTTTTT6AAAAATATCSTTCATAA6ATGTCAC6CCA6BACAT6A6CTACACA 



TjTACACA.T^CAT6CAbCCfi 
nCACATATf f ^CAT6Cq GATS 



C6BA6AATT6TTTTT 
AT6C66A 



RY RY 

CTTC6CCACT76TCACKCCCTTCAAACACC^ 

CGATTTBTCACT.CACTTCAAACACCTAAAAGA6CTTCrcrcrCAC*CACACACACATATGCArBC ifiTATTf ACACST6 tfCGCCATGCAAATCTC 



TATA 



ATG 



CTTTATA6CCTATAAAffTAACTCATCCBCTTCAC.TCTTTACTCPAACCAAAACTCATCAATACAAACAA6ATTAAAAACATACAC6 

CATTCTCACq[ATAAApTAGAG6CTCSBCTTCACTTTTTTACTCAAACCAAAACTCATCACTAC AAAACATACACA 

Start pN2 * 




6AACAA6CTCTTCCTC6 
C6AACAA6CTCTTCCTCG 



no 



220 



330 



rupA 
gNa 

pNl 
pN2 



331 



napA 
gNa 

pNl 

pN2 



441 



napA 
gNa 

pNl 

pN2 



TCTCGBCAACTCTCBCCTTCTTCTTCCTTCTCACCAAT6CCTCCATCrACCGGACBGTCGTCGAGTTC8ACGAAGAT6ATSCCACABACTCAGCCBGCCCATTTA66ATT 
TCTCSBCAACTCTC6CCTTSTTCTTCCTTCTCACCAAT6CCTCCGTCTACA66ACG6TT6T6GAA6TCGACBAA6ACGAT6CCACAAATCCA6CC66CCCATTTAG6ATT 

Start pNl * 

C A 



CCAAAAT6TA66AA66AGTTTCAGCAA6CACAACACCTAAGAGCTT6CCA6CA6TGGCTCCACAA5CAA6CAATBCAGTC TG6CGGTGGTCCTA6CT6GACCCT 

CCAAAATGTAGAAA6SAGTTTCA6CAA6CACAACACCTAAGAfiCTTGCCAACAATG6CTCCACAA6CA6GCAAT6CA6CCC66T6GrS6TA6TGGTCCAABCTB6ACTCT 

C 6 A A A 

...... A 



CGACGGT6AGTTTBACTTTGAAGACGACATS6AGAAC . , . CCBCAG6GTCCACAGCA6AGACC6CCTCTACTCCA6CAGT6CT6TAACGAGCTCCACCABGAA6A6CCCC 
CBACGGTGAGTTTGATTTT6AA6ACGAC6T6GAGAACCAACAACAGBGCCCACAGCA6A66CCACC6CCACCCCAGCA6TBCTGCAACSA6CTCCACCAG6AA6AGCCAC 

AT A A 

A 

... n 



440 



550 



551 



660 



napA 
gNa 

pNl 
pN2 



661 



napA 
gNa 

pNl 
pN2 



771 



TTTGC6TTTSCCCAACCTTGAAAGGAGCATCCAAAGC6GTTAAACAACAAATT. ..CAACAACA6GSACAACA GCAAG6AAAGCAGCAAArGBTGASCC6T 

TTT6C6TTTSCCCAACCTTGAAA66A6CATCCAAAGCCGTTA6ACAACA66TTCSACAACAACA666ACAACAAATGCAGG6ACA6CA6ATGCAGCAASTAATTA6CC6T 



G 



6 



A 



A 



TAG 



ATCTACCA6ACC6CTAC6CACTrACCTAAAGTTTGCAACATCCC6CAAGTTA6CGTTTGTCCCTTCCA6AAGACCATGCCT6GGCC. . .CTCCTACtAB ITTCCAAAC6A 
GTCTACCASACT6CTACGCACTTACCTAGASTTTGCAACATCABGCAAGTTAGCATTT6TCCCTTCCAGAA6ACCATGCCTGGGCCC66CTTCTACTA6^TTCCAAAC6A 



770 



660 



napA 
gNa 

pNl 
pH2 



861 



napA 
gNa 

pNl 
pN2 



AA CCCTCGAGTGTATGAATGTGGTTGTC5ATATAT6TCAACACCACA.CCTCATCGCGTGTTTCATAATAATATGTAAGGTTTTATCTAGG 

AATATCCTCGASA6TBT6TATACCACGGT6ATATGAGT6TG6TTGTTGAT6TATGT7AACACTACATAGTCATG6TBTGTGTTCCATAAArAATBTACTAAT6TACTAAT 
6G6 C C GG 



AT6TTT6AGGCTAATSTAAAATTAGCACTACTCCP|TAATAAAAGAG*GCTCTTAA... ( 
GTAATAAGAACTACT CC8TAGACC6GTAATAAAA6A6AftGTTTTTTTTTT, 



C 



TGTTT 

T6TTTAATTT 



1056 



FlG. 8. One kb of the gNa genome sequence haa been aligned for maximum homology with napA and 
two cDNA clones, pNl and pN2. To emphasize the close homology between the cDNAs and napA only the 
cDNA bases that differ from napA have been displayed. Dots indicate positions where gaps have been introduced 
into a sequence for alignment purposes. Conserved features which have been designated are: the alternating purine- 
pyrimidine blocks (RY), the TATA boxes, the initiation and termination codons, and the 12 bp of homology shared 
at the most downstream consensus sequence associated with polyadenylation. 
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of two different cDNA clones representing transcripts from 
other genes (Crouch et a/., 1983). Thus, four members of the 
family have been examined, although their relative levels of 
expression are not known. Comparison of all four coding 
sequences (Fig. 8) indicates that the cDNAs and napA are 
greater than 95% homologous. The gNa sequence with inser- 
tions at positions 521, 588, 714, and 734 of Fig. 8 is likely to 
represent a minor class of napins, perhaps one of the four 
discrete species fractionated by Lonnerdal and Janson (1972). 

The 3' nontranslated regions of the cDNAs and napA are 
as highly conserved as the coding regions. Such high homology 
would preclude the use of these sequences for gene-specific 
hybridization as was possible for gNa. One of the distinctive 
features of this portion of gNa is the presence of three se- 
quences resembling the consensus associated with polyade- 
nylation of rnRNAs. It is striking that although the gNa 3' 
nontranslated region is divergent, all four napin sequences 
are perfectly homologous for 12 bp around the most down- 
stream consensus polyadenylation element, suggesting that 
this is the authentic polyadenylation signal for the genomic 
clones. 

The nucleotide sequence of the genomic clone gNa and its 
flanking regions contain the canonical features expected of 
plant genes transcribed by RNA polymerase II (Messing et 
ai, 1983). There are no introns, which is characteristic of 
genes for many of the other 2 S seed proteins and related 
cereal prolamins. in the 5' flanking region of gNa are several 
blocks of alternating purine-pyrimidine nucleotides, which 
have been observed in viral enhancer (Lusky et ai, 1983). 
Their significance in napin genes remains to be tested. 

Alignment of the two napin genomic clones for maximum 
homology (Fig. 8) shows that the coding sequence of gNa is 
24 bp longer than napA with the extra sequence occurring as 
three additions of single codons and two insertions of two 
codons. However, the 5' RNA leader region of gNa is deleted 
by 10 nucleotides relative to napA. As already mentioned, the 
two genomic sequences diverge sharply past the coding se- 
quence termination codons. In contrast, the 5' flanking region 
is highly conserved overall, including the regions of alternat- 
ing purine-pyrimidine residues. Since the entire 5' flanking 
region is so highly conserved, it is difficult to single out regions 
by comparative homology that might be involved in the tem- 
poral or spatial regulation of napins. 

As mentioned earlier, napin is evolutionarily related to 
some of the cereal prolamin storage proteins. However, there 
is no evidence in napin genomic sequences of homology to the 
short upstream sequences found to be conserved in the genes 
for a-gliadin, tf-hordein, and the (unrelated) zeins (Forde et 
ai, 1985). If the conserved prolamin sequence is functionally 
significant, its absence in napin may be related to the differ- 
ence in spatial expression; napin is synthesized in the embryo, 
whereas prolamins are restricted to endosperm cells. 
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SUMMARY 



We have isolated and characterized genes encoding the sunflower US globulin seed storage proteins, 
collectively termed helianthinin. One gene, designated HaG3, has a primary transcription unit of about 1750 
nucleotides including two short intervening sequences. The predicted precursor polypeptide from HaG3 is 493 
amino acids long, is rich in glutaminc and other nitrogen-rich amino acids and includes the amino acid sequence 
NGVEETICS This sequence is highly conserved among US seed storage proteins and is involved in the 
Ut^lytic processing of these polypeptides. Additional helianthinin sequences are conserved among other seed 
storage protem gene S . Analysis of various cDNA and genomic sequences indicates helianthimns are encoded 
by a small gene family that includes a minimum of two divergent subfamilies. 



INTRODUCTION 

L\e embryos of other oilseed plants, sunflower 
embryos accumulate and store large quantities of 
lipid and protein. These stored materials are utilized 
by the seedling following germination and, in addi- 
tion, are of immense agronomic importance. The 
organi2ation and expression of genes encoding seed 



storage proteins has been investigated in a number 
of plant species, including both dicots and monocots 
(reviewed by Shotwell and Larkins, 1988). In all 
cases, the accumulation of storage proteins during 
seed development and maturation requires the highly 
regulated expression of genes encoding these pro- 
teins. Substantial post-translation modifications and 
targeting to appropriate subcellular compartments 



Con. y>ndenc* to : Dr. T.L Thomaa, Biology Department. Texas 
A ^M University, Colltge Station, TX 77843 (U.S.A.), 
Ttl, (409) 845-0 1 84; Fax (409) 845-289 1 . 
' Present address: (R.D.A.) Dtpirtment of Biology, Washing- 
l °n University, St. Louii, MO 63130 (U.S.A.), Tel. (314)889- 
68 83; (E.A.C.) Department of Genetics. Univertity of Georgia, 
*to«ns, GA 30602 (U.S.A.), (404)542-1444. 



Abbreviations: aa, amino acid(s); bp, base pair(3): 2D, two 
dimeniional; DPF, days post-flowering; Denhardt'i »oiution« 
0,02% bovine itrum albumin, 0.02% FicoU and 0.02% polyvinyl- 
pyrrolidone; ER, endoplasmic reticulum; at, nucteoti<Je(s); ORF, 
open reading frame; PAGE, poly acryl amide gel electrophoresis ; 
SDS, sodium dodecylsulfau, SET, 0.15 M NaCl. 0.02 M 
Tris-HG, 0.002 M EDTa (pH 3.0). For nucleotide iequencti* 
H - A, C or T; M = A or C. 
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arc also necessary. Consequently, these genes pro- 
vide an excellent opportunity for analysis of the 
molecular mechanisms controlling many aspects of 
ontogenic gene expression in plants. 

Sunflower seed proteins include the water soluble 
2S albumins and the salt soluble 1 IS globulins. The 
sequence and expression of albumin structural genes 
has been described (Allen et al, 1987a, 1987b), The 
sunflower 11S storage protein, designated heli- 
anthinin, is structurally similar to legumin-like seed 
proteins of other plant species and is represented in 
planta by an approximately 300-kDa hexameric 
holoprotein (reviewed by ShotweU and Larkins, 
1988). Each subunit of the holoprotein consists of a 
larger, acidic (a) polypeptide (30-40 kDa) and a 
smaller, basic (jS) polypeptide (23-27 kDa) linked by 
disulfide bonds. The a and fi polypeptides of 
legumin-like proteins such as the heUanthinins are 
generated proteolytically from larger precursor poly- 
peptides that are synthesized on the rough ER. An 
NH 2 terminal signal peptide targets the nascent 
polypeptide to the lumen of the rough ER, where it 
is removed. The US precursors assemble into 
trimers in the ER and are then transported to the 
vacuole through the Golgi. Once in the vacuole, i 1 S 
precursors are cleaved into disulfide-linked % and 
polypeptides, The trimers then assemble into 
hexamers, and following additional protein accumu- 
lation, the vacuole subdivides to form protein bodies 
characteristic of many plant seeds (Higgins, 1934). 

The cloning and expression of helianthinin 
mRNAs has been described (Allen et al., 1985 ; Allen 
et al., 1987b). Synthesis of helianthinin mRNAs and 
precursor polypeptides is tightly regulated during 
sunflower embryogenesis. Helianthinin a and ft sub- 
units first appear about 7 DPF, two days after the 
albumin seed proteins appear (Cohen, 1986), and 
like the albumins, these polypeptides continue to 
accumulate through much of sunflower seed devel- 
opment, Helianthinin mRNAs are also detected 7 
DPF; these transcripts accumulate and disappear 
with kinetics similar to those observed for albumin 
mRNA (Allen et al., 1987b). 

In this paper, we describe the isolation and 
characterization of genes encoding helianthinin in 
sunflower. Sequence and S 1 nuclease analysis of one 
gene, designated HaG3, defined a primary transcrip- 
tion unit of about 1750 nt, including two short inter- 
vening sequences. The helianthinin polypeptide 



predicted from the nucleotide sequence of HaG3 
shares significant, functional sequence homologies 
with other US seed storage proteins. Analysis of 
cDNA and genomic DNA sequences indicate heU- 
anthinins are encoded by a small gene family that 
includes at least two divergent subfamilies. Se- 
quences located 5 ' of the HaG3 transcription unit are 
conserved among other seed storage protein genes. 



MATERIALS AND METHODS 

(a) Materials 

Sunflower seeds (Hclianthus annuus L. cv. Giant 
Grey Stripe, Northmp King Seed Co., Minneapolis, 
MN) were obtained commercially. Embryos from 
field-grown plants were dissected from achenes at 
the indicated times, frozen in liquid nitrogen and 
stored at -80° C 

(b) Isolation and labeling of nucleic acids 

* 

Bacteriophage and plasruid DNAs were prepared 
by standard methods (Maniatis etal., 1982), Total 
and poly (A) "RN A from leaves and staged sun- 
flower embryos were prepared as described by Allen 
et al (1985). Radiolabeled hybridization probes for 
genomic library screening, phage recombinant map^ 
ping and genomic DNA blots were prepared by met 
translating a Ll-kb EcoKl fragment prepared from 
the cDNA recombinant, Ha2 (Allen et ah, 1987s; 
Allen, 1986). ; 

(c) Plaque hybridization 

Construction of a sunflower genomic library to 
bacteriophage A vector EMBL3 (Frishauf a 
1983) has been described (Allen et al, 1987a). 
library was screened for helianthinin phage reco 
nants by hybridization with nick translated^ 
cDNA probes (Benton and Davis, 1977), Fil 
were prehybridized 4h and hybridized 15-18 & 
67'C in 4 x SET, 5 x Denhardt, 0.2% $ 
100/ig/ml denature calf thymus DNA, 50 M 
poly (A) and 10 ng/ral poly(C). Filters were w 
successively at 60°C in 4 x , 2 x , and 1 x 
containing 0,025 M phosphate buffer and 0.2% 



- h each, air-dried and autoradiographed. Posi- 
[ 0 v t -combinants were plaque-purified and reStric- 
Hon-mappcd by standard procedures (Maniaw 

^) Nucleotld* sequence analysis 

HaG3 DNA was sequenced by the dideoxynucle- 
AU de chain termination method (Sanger et al., 1977) 
^e; ligation into M13mpl8 and Ml3m P 19 and 
.affection into JM101 (Messing etaU 1983). 
Single-stranded recombinant phage DNA was 
orocessed and sequenced as described (Sanger et al., 
1980) Additional overlapping T4 polymerase dele- 



tions of selected recombinants were prepared and 
sequenced as described by Dale et al. (1985). The 
complete sequence of HaGS was assembled from 
these overlapping clones. Computer analyses were 
done on a DEC MicroVax using the University of 
Wisconsin Genetics Computer Group (UWGCG) 
Sequence Analysis Software (Version 5.0; Devereux 
et al., 1984). 

(e) Transcription analysis 

Nuclease mapping of the transcriptional start 
point otHaGS was done as described by Favaloro 
et al. (1980) using a 446-bp*fo>M>raI fragment (see 
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Fig. l. Phy.ical maps of .unflower hcUanto gen... Panel" A and IB:. junction map.J, a ^ < ^ rf ^ 

tf fl G3-Al Shaded areas indite reg.cn, *bich hybndue «U> "^i^^to unit intf^-Dl U shown. Dark 
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box« indicate «xo«t; open boxe* indicate introm. B. BglU, E. Bco*i. ^ "S 



l SUctAtcrrA^^ 

71 AGTCCATCTGTATTTAXATAATATTACTAATaTTTTATTAACATCAATACATOCTTCACCTTTTC'IGTTA 
1,1 GTCTTGGTTTTTTATATGGtTTTATCAGCGGTGTGGTGTACGATGACGATTATTTAAATAATGACGGACT 

211 tcttggttgttacttattgatgtacgaacctgagatgtaacgaaccgaacacatataaataacattttgg 

281 ATAAGATTAGGACTXTATTTATCGGTTGCCATGAAATTTGGAAGACTTGCGTTAAGACACAACCACATAt 
m AATGTGATGGTAAATAGCATTTACAACTAATGTTAATCTTTTGTTACAAA'IGTTGTTAACTAGGCTTGAT 
421 ATGTAAAATTTTTAAAGACTATATGGTGTtCTTACGGTTTIACATCTAGTAAGAGATTAAAAAAAAAAAA 

,r aagcaagcaaagtaagtgtaaagagagtaaaga^ja^isiacccatgatatggctgattgttcatcaccat 

56 1 CCCATTTATACTTATCATCTTGATGATGCATATAGAC^AAflAjC^^ACTTATACAGATGTA.SCA^SICTG 

_ . r*n*n AT r.TTCTTC C ACTA 



0 ,3 X AUOi^fMMIi a, 

gaaccagtgccagcttcaaaacatcgaggcgctcgagcccatcgaagttatccmgctgaac^ 

NQCQLQNIEALEPI- VA ^ 



8il ACA 

Q N Q 



fJl.VAFSCL?TSTfi. 

1051 cacataaataaatattttaagagtcgcaaattaagtttaaaaataataatc 

- Intron i 

1121 AlGTTTAMGCT^ 

im mc ^™«g™a^ 

1401 GG^AACaTATATAGTCTAAAAAAC^ 



Ul l CAAC^TTCAACCCTTTCACCC^^ 

qqGRRGGWS Ji G_E — — * — 1— ^ — *- 

1891 AACAriGAGAACCCTTCCCAGGCT^T^ 

2031 atgcccacactccacaatc^ccc^ 

224! CATC^CAAACCT^ 

YQLSREEAQQLKFSQRti 
2381 crrrXTCCAGGGGCCAAGCGATCAOGGCrrCACGTTAAGTCAAATGTGTAGTTGCATTGTTAACriCAAC 

FSRGQGIRASR- 
2451 TTGAAGAAlAAAAGATGTAAGGGAGTTATGTAi\XATAAGTGCAAGAGGTAATAAGAGCTTCACGTATGtT 

2521 tatgcatatttatctaaataaaatattgtctcgcxttxgcttaatctattatatataatatgtaxttgtg 
2591 tttcatatttttgaagggatataatcggatgacgtaigcatcctcatccttaaattatacatttccatgg 

2661 ACATGTATATAGTGCTTrTGTTATTTTTGATATAMCATATTACATTTTTAGTTTTGTTGTTTTTGATAT 
2731 ACACATATTACATTrTAGAACACTAXTACGTGTTAATAATMTTCtTTCTTTCATTATAGGTTTGTATAG 

Bgl II 

2801 TATTTCTCCGGCTATGAOTGAGATCT 
i/? cleavage site is boxed. Introiw 1 and 2 «e radicaud. 
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Fie 2 nt position 433 to 879) asymmetrically labeled 
at the 5' terminus of the Xhcl she. Total onbryo 
RNA was used. The only differences in metood were 
that the hybridizations were carried out for 6-8 h 
and 10 units of SI nuclease were used per reaction. 
Reaction products were analyzed on P**^ 
de-urea gels. The 446-bp Xhol-Dral fragment was 
subjected to Maxam-GUbert sequencing reactions 
(Maxam and Gilbert, 1980) which were then used as 
length markers in the SI protection experiments. 



RESULTS AND DISCUSSION 

(,) Isolation and characterization of helianthinin 
genes 

A cDNA recombinant representing helianthinin 
mRNA was used to screen a sunflower genomic 
DMA library constructed in the bacteriophage a 
vector. EMBL3 (Fnshauf et al, 1983). MuUipfc bac- 
teriophage X recombinants representing the hehan- 
gene family were recovered in these screen, 
pTrthe analysis of these recombinants by hybndna- 
L with the divergent helianthinin cDN As Ha2^ 
Ha 10 (Allen et al., 1987b), defined two divergent 
Subfamilies that encode helianthinin in sunflower 
embryos (Fig. 1,A and B). Two ^ophage 
recombinants, JfaGlO-Bl and HaG10-Cl hybnd 
ize primarily to HalO; under less strmgent hybrida- 
tion criteria (6 x SET, 55'C), these recombinants 
cross-hybndized weakly with Hal (data not shown). 
Conversely, **G3-D1 and IfeGS-Al were more 
^7ho2 than to HalO. Addiuonal sequence 
data presented below confirms these sequence re- 
lationihips. 

(b) Sequence of the Had helanthinin gene 



A 2.8-kb region of the genomic recombinant, 
tfflG3-Dl. bounded by BgM and Sofl sues (Fig. 
1C) was sequenced to determine the precise organi- 
zation of a representative sunflower legumm-hke 
seed storage protein gene (Fig. 2). Three exons 
separated by two very short introns (99 and ,9 bp) 
were identified by comparison to the amine .acid 
sequence predicted from the helianthinin cDNA, 



Ha2 (Allen, 1986). Intron/exon boundaries wers 
assigned based on ORF discontinuities at eaca 
junction, on the colinearity of HaG3 and Hal oa 
either side of each intron and on the : presence of 
consensus splice junctions (Mount, ^2). T* toce, 
tions of the three exons and two introns m.Ha03-u\ 
are schematically shown in Fig. 1C; the pretu. 
sequence locations are displayed m Fig. 2. 

The introns in the HaG3 transcription un»t differ 
in number and location from those observed for* 
prototypical kgA gene of pea (Lycett et el., 1984 . 
The legA gene has three introns at aa positions 95, 
179 and 388 (henceforth referred to as 11. 12 and 13). 
The HaG3 legumin gene has two introns at approxi- 
m ately the same positions as II and 12, 13, however 
is missing from the sunflower gene. The pes ^ 
genes (Gatehouse et al., 1988) and the V:aa faba 
LeB4 gene (Baumlcin et al., 1986) each contain two 
introns; in these genes however, 12 and 13 reman 
and 1 1 is absent. Interestingly, two divergent Arab* 
opsis legumin genes contain all three introns a 
approximately the same relative position as pre- 
viously noted for the pea leg A gene (Pang ctaL, 

l9 The HaG3 transcription unit was mapped by SI 
nuclease protection (data not shown). The transenp. 
uonal start point is located at nt position 726 (Fig. 
2) 32 nt upstream from the translation^ inrtwoc* 
site Consensus sequence elements typical of RNA 
polymerase II transcription units in the regie* 
surrounding the legumin transcription unit are un- 
derlined in Fig. 2. These include a CAAT home** 
at nt position 635 and a TATA homology at posmoo 
699 both 5' of the transcriptional start point. A, 
consensus polyadenylation signal, AATAAA, is * 
cated 37 nt 3' of the stop codon. * 
Sequence elements located 5' of the HaGl trtft 
senption unit are shared with upstream sequen| 
elements associated with other storage pro«| 
genes. Particularly noteworthy is the conseryauooj 
an element of the legman (teg) box, a phyloge** 
cally conserved sequence located approxima w 
nt upstream from several genes encoding W 
and legumin-like seed proteins (Blumleo . « 
1986). Although the complete leg box is do 
served in HaG3, three elements that differ from 
sequence AGAATGTC by only one nt are to 
between 50 and 210 nt upstream of the ftOM 
(indicated by a in Fig. 2). In addition to elem* - 
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■ box. the consensus sequence, HAACAC- 
j,) characteristic of most seed protein genes 
(Goldberg, 1986) is present at position 598 in Fig. 2 
■ dicated by b). Despite the conservation of sc- 
ience and location of the lcgumin box elements and 
Oie CACA motif in HaG3 % the functional significance 
0 f ihese conserved sequences remains to be deter- 
ged. 

(c) > lolecular characteristics of helianthinin and its 
precursors 

The precursor polypeptide predicted from the 
HaG3 sequence is 493 aa and has an M T of 64.5 kDa. 
^ s with most legumin-like seed proteins, the HaG3 
eene product is rich in amide amino acids, e.g., 
glutamine and asparagine, and is relatively deficient 
in rr thionine and cysteine (Table I). As expected 
from previous 2D PAGE analyses (Allen, 1986), 
charged amino acids are distributed within the 
precursor polypeptide so that the x polypeptide has 
a net negative charge at neutral pH whereas the /J 
polypeptide is positively charged under the same 
conditions. 

The mechanism of post-translational processing 
and targeting of 1 IS globulins to protein bodies is 
complex, and although in some cases sequences 
required for these events are phylogenetically con- 
served (Borroto and Dure, 1987), the molecular basis 
of these events remains to be elucidated. The initial 
processing event, cleavage of the signal peptide, 
occurs co-translationally and results in the transport 
of the cleaved polypeptide into the lumen of the ER. 

TABLE I 

Acino acid composition of HaG3 precursor polypeotida aa 
predicted from the sequ«nc« m Fig. I 



Amino acid Number (V 0 ) Amino acid Number (%) 



Ala 


38 (7 71) 


Met 


3 (0.60) 


Cys 


6 (1.22) 


Asn 


34 (6.90) 


Atp 


16 (,3-25) 


PfO 


22 (4.46) 


Glu 


31 (6.29) 


GlO 


69 (14.0) 




25 (3.07) 


Arg 


39 (7.91) 


Gly 


35 (7.10) 


Ser 


31 (6.29) 


His 


8 il.62) 


Thr 


23 (4.66) 


Ik 


24 (4.87) 


Val 


29 (5.88) 




10 (2,02) 


Tyr 


3(1.01) 




37 (7.51) 


Trp 


3(1.62) 



The probable NH 2 terminal leader sequence of the 
HaG3 precursor is indicated in Fig. 2; this site was 
selected using the -1, -3 rules defined by von Heijne 
(1986) for signal sequence cleavage site selection. 
The locauon of the predicted a/0 cleavage site is 
boxed in Fig. 2 (see below). 

(d) Divergent subfamilies encode helianthinin 

Hybridization and restriction analysis of two 
nearly full-length cDNA recombinants, Hoi and 
HalO, suggested that sunflower US seed proteins 
were encoded by two divergent subfamilies (Allen, 
1986; Allen etal., 1987b). These subfamilies are 
designated Hal and HalO, corresponding to the 
cDNAs that distinguish each subfamily. Genomic 
blot analyses (Allen, 1986; Allen etal., 1987b) 
revealed that the Ha2 subfamily includes at least 
three members and the HalO subfamily includes two 
or more members. Genomic sequences representa- 
tive of each subfamily were isolated from a sunflower 
genomic DNA library; restriction maps of these 
recombinants are shown in Fig. 1. Regions that are 
complementary to either Hal or HalO are indicated. 
Even at relaxed hybridization criteria (6 x SET, 
55° C), Ha2 and HalO, or their genomic homologues, 
cross-hybridize very poorly (data not shown). Based 
on the intrafamilial sequence variation reflected in 
restriction site locations in regions flanking heli- 
anthinin coding sequences (Fig- 1), we conclude that 
HaGlO-Bl and CI are non-alielic members of the 
HalO subfamily; similarly, HaG3 -Al and Dl are 
non-allclic members of the Ha2 subfamily. Based on 
genomic blot analysis (Allen, 1986, Allen etal., 
1 987b), the helianthinin genes shown in Fig. 1 cannot 
represent all members of each subfamily. In the Hoi 
subfamily, at least two additional members remain 
uncharacterized, and in the HalO subfamily, there is 
at least one additional member. 

The extent of divergence between the Ha2 and 
HalO subfamilies is illustrated in Fig. 3 where the 
DNA sequence from a region of HaG3, including the 
a/0 cleavage site (Fig. 2), is compared to a similar 
region of the cDNA, HalO. Overall these nucleotide 
sequences share only 50% sequence similarity. The 
predicted HalO and HaG3 aa sequences share 43% 
3imilarity (data not shown). This latter observation 
suggests that the majority of the helianthinin coding 
sequence has diverged significantly, so much so that 



! ^mCAACGACAACAAtt^ 45 

46 CGACAACAAAC CAGACAAC AAG GAACACAACAAOC CAGi^CCAOAAGTC C « 
U30 CACCAAC^ 

96 ctttgkwcaccaW. . . ctgcaCagacacaatgtaTawctggTT 142 

! 1 I I I I I I I I I I I'll I 1 I ' ! ! ' 
AAGGCAAAGi^GGC^^ U29 

LM ?CCA:*«CAArtACTGCA^^ 192 

U10 T CACC C C C 3A0 CTCATTG CaCAAT^'^^ cctc ^ cc ^ CaCaCCCC ^ 

iitil I I 1 1 I i 'liit i 1 > 



...,ac 277 served 



2fcl cCAACAOATGC^rrCflTCACCCCTGAOCMCA 

iTie acaUcctt<^ta«c^ 1779 

27i aacaacaaatcc : '•■;^ CCT tnTn J0 ' 

1710 IJJ^ciia^CWCGTCTCCTCCC^CAACAACACacCASeM '-•» 
J0» eCACCACCCAIC :CAAC<^<^0*^CAATATSCAC1WC 3 JO 

n I,, ( iii:i:iii!ir,M!ii :i urn 

1.30 caCAGACCWC^AWAMAACCC^ U " 

Ml TAAACCTTCT^I^AC^ACATTOGATAACCAAACAOXGG^fiA«C'yCTTCA *00 

. . mi I I 1 1 1 1 1 I I II H I I I 1 ' ... 

MM GAACTTCAAACTCAACA . , . ttCACAACCCTTCCCACCCTGACTTTCTAA 1*16 

401 ACCCeCAA^C(^«=AACAl«aCA^CACAAACTCeC^TT »30 

x „ 7 J^cO«JicCOMi^ " 7S 
451 ortAICrcTeAIMAaTCAAICCCCOASMA^CWCTCAACCCAATa J00 

1177 .^0CAcLi0TC5CCCTL^(^<^^AAeTCCGTCCCms 2023 
,01 fiAmnCTWCACACTCCAt^CAACACCCA^OTCCTCTACACr 350 

20« e^Utcccc^^ 

j:i wcMca^ACca^ « 00 

J073 AACCCA5^C^CCTrCACCCTACAAATCCTC0ACAACCA*CGAAACTCAC IIU 
601 TCTTSAACOAOCAACTeACCAOAOOTOACATTnCCCAflTCCCA^C MO 

0 I I II I I I I 1 1 III I 

2125 TTTTCCACAACCaOCTCCCTCACCCACAGCTCOTCCTCAXCCCCCACAAC 217* 

6Ji aTCCCTCACcAACTCCTCCABCT06ACA^t«<?rrcCA«^Ke«e too 

I I II I I lit] I III I 1 1 I I 1 1 I 1 

2175 TITGCC . . . <rTCATCAACACAOCCAATOAACAAC0MGCAflC*CCCTCTC 2221 

701 ottcaacaccaaccacctccactgaacaqccca.ttagccccgtacaCat ?»» 

2222 tUcii^tC^ 
?S0 eOOTI«CCOACCeATCC(M(rTOA^8ATCACCAACTaTAICACCW 7M 

ii i ill 1 1 1 1 I 111 III I H 1 1 1 1 I 

J272 CCGCATCAGCACCATCGCCCTTQACCT?WCCCCCAATCCCTATCACCfA 2321 
SOO TCAC C JAAC CAGG CTCA^AGCTTQAAACTCAACACCC AGAC C CAGAGC 849 

2322 tctLcacoaag^ 23U 

»50 ACTtnTTrCTCCACA0AGGCA(TtACTAGC0OAACTAAATO(ITCCCA4TAC 

\\\\\ \ 111 1! I I II J 

Fig 3 Comparison of MaG3 and HalO sequences. Nucleotide 
sequence of HalO (upper sequence) and tfaCi (lower sequence) 
spuming the repon encoding the x/J cleavage sue were compar- 
ed the aa sequence of the at/fl cleavage site (boxed in Fig. 2) is 
5hown below the nucleotide iequence. Solid bars indicate identi- 
cal nucleotides. Dow indicate gaps inserted to maximize se- 
quence homoloay. The first 1479 nt of the HaGl sequence arc 
not shown. 



second-hit mutations contribute significantly to the 
overall divergence of the two helianchinin sub- 
families. Although highly divergent throughout most 
of the protein coding sequence, the similarity of the 
DNA sequences encoding the a/0 cleavage site 
approaches 90%; in this region, 24 of 27 nt are 
identical. The three nt differences in Fig. 3 are third 
base changes; consequently, the predicted a/0 
cleavage sites of HaG3 and HalO are identical. 

(e) The a/ /J cleavage site is phylogenetically 



con- 



Comparison of the predicted aa sequences for 
sunflower helianthinin and Brassica cruciferin 
(Simon et al„ 1985) revealed an overall similarity of 
46% including conservative aa differences (data not 
shown). However, the region encoding the %\B cleav- 
age site in helianthinin and cruciferin are nearly 
identical, differing by a conservative change from 
valine to leucine. The phylogcnetic conservation at 
the a//? cleavage site is illustrated in Fig. 4. The a/0 
cleavage site and the nt sequence encoding each site 
for HaG3 and HalO and Hal (Allen, 1986) are 
included. The ttlfi cleavage sequences for legumin- 
like seed proteins in seven other species, including 
both monocots and dicots, are also summarized in 
Fig. 4. At the aa level, the sequence conservation is 
striking. The presence of a serine in the last position 
appears to be characteristic of /e^S-type genes 
(Wobus et al., 1986); threonine at this position u 
indicative of /«£4-type genes. Based on these data 
and the intron/exon structure, we conclude that the 
HaG3 helianthinin gene is a B-type gene and is most 
similar to the LeB4 gene of V. faba (Baumlein ct iL, 
1986) and legJ/K genes of pea (Gatehouse et iU 
1988) than to the prototypical tegA gene of pW 
(Lycett et al., 1984). Thus far, all helianthinin gen«» 
or cDNAs analyzed are of the feftf-type. 

(f) Conclusions ,i 

(1) The sunflower helianthinins are legumin-l* 1 
seed proteins and are encoded by at least 
divergent gene families defined by the sequences 

HaG3 and HalO. 

(2) Among legumin-like seed proteins of diver* 
plant species, the most conserved aa sequences 
those required for appropriate post-translaoo ^ 
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HaG3 

Ha2 

HalO 



NCVEETICS 

AACGGTGTGGAAGAAACCATCTGCAGC 
AACGGTCTGGAAGAAAC CAT t TGCAG t 
AACGGTGTGGAAG AAAC aATaTG CAG t 



L V T 

AAtGGgcTtGAgGAAACCgTtTGCAct 



AAtGGTtTGGAAGAAACCATCTGtACt 



fisum sativum 
ley A 



ley J/K 



L V T 

AAtGGgcTtGAgGAAACagTtTGCAct 

L 

AA tGGT t TGGAAG AAAC tAT CTG tAG t 



Brags ica napus 



AACGGTtT aGAAGA gAC CAT aTG CAG C 



dopsis thaliana 
CRA-1 



CRB 



Subfamily A 



Subfamily B 
ftyana aatlva 



AAtGG c tT aG A gG AgAC CAT CTG CAG C 

L L T 

AAtGGTtTaGAgGAgACttTgTGCAcC 

L F 

AAtGGccTcGAgGAAACttTCTGCtcC 

L F 

AACGGctTaGAAGAAACatTCTGCtca 

L N F 

AAtCGTtTGGAgGAgAtttTCTGttca 

L D FT 

AACGGTtTGGAcGAgACCtTtTGCAcC 



CONS ENSUS NGLEETICS 

F T 

% 4. Phyiogenetic comparison of legumin a//? cleavage sequences. The complete u sequence for the sunflower a//? cleavage lite is 
Jhown; downward arrow indicates cleavage site. Only aa that differ from the sunflower sequence are shown for the other plant species. 
The nucleotide sequences encoding the a/0 cleavage sites are displayed immediately below its corresponding complete or partial aa 
Sequence Nucleotide* that differ from the HaG3 sequtnee are ihown in lower case tetters. A consensus a/0 cleavage site is indicated 
« the bottom of the figure. Data sources include Htllamhus arwuus: this work. Alien et al. (1987b); Allen (1986); Vtcicfaba* Vvobus 
«al. (1986)- P»um sativum: Lycett et al. (1984). Gatehouse et al. (1988); Brasska napus: Simon et al. (1985); Arabidopsls thaliana: Pang 
ai. (1988); Gossypium htrsmum: Chlan et al. (1986); Anna saiiw Walbur 6 et al. (1986), B. Larkins, pers. communic; Oryza saw, 
Takiiiwa et al. (1987), 
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• . and intracellular trafficking. With these 
process* and w*** slorage 

constraint, the bulk of* ^ ^ 
proteins apparency are ^ fee 

exacting regulatory se ^ cnc riatc w . 

'functionally conserved to ensure app _P 

scriptional regulation of legurmn-hk gen s 
X\ Rased on the aa sequence of the a,/* cleavage 
/\e seed protein encoded by HaG3 is most 

-rtote P K^^ 

S^XeS 

and location among various wgumu* 
number ana ioca progenitor 
storace protein genes, it is UKeiy u< r 
storage . _ icaumin genes was an A-type 
gene for the B-type egu «, ^ 

legumin gene (Lycett et ai., ^ > 
introns. 
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The sequence of a gene encoding convicilin, a seed storage protein in pea (Pisum sativum L.), is reported. 
This gene, designated cvcA, is one of a sub-family of two active genes. The transcription start of cvcA was 
mapped. Convicilin genes are expressed m developing pea seed cotyledons, with maximum levels of the 
cr responding mRNA species present at 16- 1 8 days after [lowering. The gene sequence shows that convicilin 
is similar to vicilin, but differs by the insertion of a 121-amino-acid sequence near the A'-terminus of the 
protein. This inserted sequence is very hydrophtlic and has a high proportion or' charged and acidic residues, 
it is of a similar amino acid composition to the sequences found near the C-terminal of the s-subumt in pea 
legumin genes, but is not directly homologous with Lhcm. Comparison of this sequence with the ' inserted % 
sequence in soya-bean (Glycine max) congiycinin (a homologous vicilin-type protein) suggests that the two 
insertions were independent events. The 5' flanking sequence of the gene contains several putative regulatory 
elements, besides a consensus promoter sequence. 



INTRODUCTION 

Convicilin has been termed a 1 third storage protein y in 
pea seeds, in addition to legumin and vicilin [1]. It can be 
purified frorr both legumin and vicilin, and it consists 
solely of pol> peptides of M t approx. 71 000. It does not 
thus contain polypeptides found in either of the two 
major borage proteins [2]. On the other hand, convicilin 
isantigenicaly similar to vicilin [1], and it is possible to 
produce molecules containing both vicilin and convicilin 
polypeptides; for this reason, some authors have con- 
sidered that convicilin and vicilin are the same protein 
[3J. Sequence data for a partial cDNA clone, pCD 59, 
identified as encoding convieilin by hybrid-release trans- 
lation, supported this view, since the deduced amino acid 
sequence was strongly homologous with that of vicilin 
[15]. However, pCD 59 did not hybridize to vicilin 
cDNa specie; [5] or vicilin genes [6J. 

Variation in the mobility of convicilin polypeptides, on 
SDS/polyacrylamidc-gcI electrophoresis, between pea 
lines has allowed a convicilin locus, designated * cvc\ to 
Remapped to chromosome 2 in pea [7]; it is distinct from 
*ny vicilin loc us so far identified [8,9]. Convicilin has 
"Cen shown to be encoded by a small gene familv; 
hybridization 0 f the cDNA clones pCD 59 and pCD 75 
(a 1o;;ger version of pCD 59; [5]) to genomic ON A 
r cstric:ed with cndonuclcases detected one or two 
hybridizing fragments, depending on which probe was 
[5,6,9]. 

The isolation of a genomic clone containing a convicilin 
ftnc, putatively corresponding to the eve locus, has been 
^scribed [9]. The present paper reports rhe sequence of 
gene and its flanking regions, and shows that 
ton > cilia genes in pea (Pbum sativum I ) form a sub- 
^nii > of the total family of vicilin-type genes. 



MATERIALS AND METHODS 
Materials 

Pea seeds of Lhc cultivar (cv.) Fcltham First were 
obtained from Suttons Seeds, Torquay, Devon, U.K.; 
seeds of ev. Dark Skinned Perfection were from S. Dobie 
and Son, Torquay, Devon, U.K. The isolation of the 
genomic clone lambda JC4 t and its sub-clone pJC 4-100, 
from a genomic library prepared from DNA isolated 
from Pisum sativum cv. Dark Skinned Perfection has 
been described previously [9]. Reagents and enzymes for 
Ml 3 DNA sequencing were from Gibco/BRL (Gibco. 
Paisley. Renfrewshire, Scotland, U.K.); restriction 
enzymes were supplied by Northumbrian Biologicals, 
Cramlington, Northd., U.K. SI nuclease and other 
enzymes were from BCL, Lewes, East Sussex, U.K. 
Radiochemicals were supplied by Amersham Inter- 
national, Amersham, Bucks., U.K. Other reagent? used 
were of analytical quality wherever possible. Nitro- 
cellulose filters were type BA85 (Schleicher und Schuell) 
from Anderman and Co., East Molesey, Surrey, U.K. 

Methods 

DNA sequencing. Restriction mapping on pJC 4-100 
was earned out by conventional methods [10]. Pre- 
paration of subclones from pJC 4-100 in pUC18 or 
plJC19. preparation of sequencing subclones in 
M13 mp!8 or mpl9, preparation of single-stranded 
DNA. and dideoxynucleotide DNA sequencing using 
[a- w S]thio-dATP were also carried out by standard 
techniques [1 1-14]. The sequence given was determined 
by overlapping sequences from subclones; both strands of 
the DNA were fully sequenced. Sequences were analysed 
by diagonal dot-matrix comparisons [15], using a 



y^Vn- sequent cialu have r>cen MihmHicd (J lhc tM Hi .Gen Hank Diilu Libraries ■jnds.T lhc jccMsion number Y0072I 
t To u, K .om correspondence and cx-prim requests thomo K aUdrcwcil. 
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program written by ourselves and hv manual com- 
parisons supplemented by sequence-hand!,™ son ware 
(programs NNCALN and IASTi\ kindK supplied by 
Or W Person). Hydrophilicitv profiles were olotlcd 
using the merhod ol' Hopp & Wood ft 6' 

Blo(ting techniques, Restriction f ragmen is from pJC 
4-100 or its subclones were isolated from low-oelling- 

f u ^ p n 1 r ?i ure il £ a, ' osc fid* [ ' 7 ! and labelled* with 
la-^PldCTP (400 Ci/mmol: 100 /<Ci uscd/0.2 0 5/,g of 
DNA) by nick translation [IS]. 'Southern " biois of 
agarose-ge separations of restriction fragments or 
digests ofpea leaf genomic DNA (purified as :n [ I9J) with 
restnciion enzymes, wore prepared and hybridired to 
denatured ahclied probes in 5 x SSC ( ! x SSC is 0. 1 5 m- 
MaCl/O.Oli M-sodium citrate buffer, pH 7.2J/2 x Dcn- 
hardt's solution (I x Denhardt's soiutton is 0 02° 
Ficoll/O.02% bovine serum albumin/0.02 °, polyvinyl" 
pyrrolidonej/dcnamred herring sperm DNA (100/tg/ 
ml), at 65 >C as described in [20]: subsequent washes 
were to a hybridization stringency of O.lxSSC at 
Co C 'Northern' blots of agarosc-gel separations of 
glyoxalated total RNA samples prepared from pea (cv 
Feltham Km) cotyledons at different developmental 
staces as previously described [21] were prepared and 
hybridized to denatured labelled probes in 5 x SSC 
2x Denhardt's solution/denatured herring sperm DNA 
(200/<g/mi;/50% (v/v) formamide, at 42 °C [22J- 
subsequent washes were to a hybridization sirineencv of 

0. 1 x SSC /0. 1% SOS at 50 C C Densitometry of auto- 
radiographs, obtained by exposing the washed blots to 
preflashed X-ray film at -80 °C, was carried out on an 

1. -KB (Brorc.ma k Sweden) Ultroscan XL densitometer. 

SI mapping. Si mapping was carried out as described 
by Faviloro at aL [23]. Each assay mixture contained 
5//g of polyadenyiatcd RNA, prepared from pea (cv, 
Feltham First) cotyledons at a mid-development stage 
(14-15 days after flowering) as previously described [24] 
and at least 0.2 (approx. 2x|0 tt c.p.m.) of DNA 
probe, 5' end-labelled [25] with (y-"P]-ATP (6000 Ci/ 
mmol; 50/*Ci used / 0.2-0. 5 /xg of DNA). The protected 
fragment aftsr SI digestion was run on a DNA sequencing 
gel, and its 3' end was mapped by running a DNA 
sequencing reaction that covered the same region of 
sequence on the same strand, and had been primed by an 
oligonucleotide primer whose 5' end corresponded to the 
site of labeling, in adjacent tracks. Controls omittinc 
RNA were carried out. 

Protein ssquencing. Conviciiin was purified as pre- 
viously described [I] Portions (2 mg) of the protein, 
dissolved m 0. 1 % tritiuoroacctic acid, were subiected to 
h.p.l.c. (Vydac reverse-phase C u column; elution with a 
gradient of rcetonitrile in 0.1% tnfluoroacclic acid) to 
remove traces of vicilin. Conviciiin polypeptides were 
digested with trypsin, and the resulting peptides were 
separated by h.p.l.c. and sequenced by the manual 
diammobenz.iyl isothiocyanate method, as prcviouslv 
described [26|. A'- Terminal sequences for conviciiin were 
automated sequence determination on an 
Applied Biosystems model 371 A protein sequencer, with 
online h.p.l.c residue identification. A 0.3 mg sample of 
protein was used per determination. 
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RESULTS 
Genomic clone 

* P*"i a ! restriction map for the genomic su}^ 




pJC 4-100 has been published previously '9j. a">?* 
and detailed map, showing the position of'thr — ^ 



in 



e region sequenced, is given in Fit*. Ha}. Th c e 
contains approx. X kb of sequence" 5' Hankin/** 
conviciiin coding sequence, and approx 3 U J* 
(..'inking sequence; these regions do noLconcuin , qu / * 
hybridizing to probes from the eve coding scquenc^? 
results not shown]. Re 2 ions 0 f this clone outside i ; 
sequenced region arc not discussed further in 'he 
paper. 

The conviciiin gene 

The sequencing map for the conviciiin gene is g ]Vtn ; 
Fig. l(/>), and thc complete sequence of the gene and? 
immediate 3' and 5' flanking regions is *iven in p£? 
We have designated this gene \vc/1\ Thc predict 
sequence of the encoded protein was deduced h 
homology with vicilin and by thc presence of an or* 
reading frame at the 5' end, and is also shown in Fit? 
The coding nucleotide sequence is interrupted by h 
mtrons. whose positions could be inferred from 
predicted and determined protein sequence (the pre** 
paper) and from the nucleotide sequences of thc conyidi 
cDNA species pCD 59 [5], the homologous Phased* 
vulgaris (French bean) vicilin (phaseolin) gene [27] and 
homologous pea vicilin cDNA species and genes ([28JM- 
J. A. Gatehouse, D. Bown, M Levasseur, R. Sawyer i 
T. H. N. Ellis, unpublished work). The sequence frod 
start codon to stop codon thus contains six exons, of 661 
176, 75. 324, 283 and 197 bases respectively, and fi« 
introns, of 151, 103, 103, 88 and 97 bases respectively.! 
The encoded amino acid sequence is 571 amino ad 
m length, and predicts a precursor polypeptide 
M T 66986; when the leader sequence of 1% amino ado 
(sec below) is subtracted the predicted A/ r for thc maturt 
polypeptide is 63928. The discrepancy between tha 
value and the polypeptide M r determined tor convidln 
(71 000) is discussed below. 

The 3' flanking sequence given extends for 428 batfi 
after the stop codon, a further 450 bases of sequcna 
have been determined, but do not show anv significant 
features and will not be discussed further.' Two poly 
adenylntion sites are present in the 3' flanking sequent 
i 19 and 134 bases after the stop codon; the first of these 
is of the multiple overlapping type (AATAAATAAA) 
often found in plant genes [30]. The 5' Hanking sequence 
contains a good match to the consensus sequence fan 
plant gene 'TATA' box [31] 66 bases before the start 
codon (CTATAAATA). Other sequence features :n ihii 
region are discussed below. 

Partial sequence of conviciiin 

The identity of the gene cvcA was confirmed by 
comparing its predicted protein sequence with parti* 1 
protein sequence data from conviciiin. Tn ail. 16 residues 
at the .V-rerminus of conviciiin and an additional H 
residues from 14 trypLic peptides were determined 
Results are shown in Fig. 2. Thc determined scqujnc* 
agree fuily with the sequence predicted by cv< A and sho* 
that the first 2S residues of the predicted sequence are no* 
present tn the mature polypeptide. These removrf 
residues constitute a typical 'leader 1 sequence [32]. >V 
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flfi. L Restriction map of the clone pJC4-100 containing cvc/1, 
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amino acid 209, two residues were found in tryptic 
peptides; N, as predicted by cvcA, and Q (onc-lctter 
notation). Pep ides were obtained from all six exons, 
snowhvj that ;hc assignment of intron positions was 
valid. 

Expression of cvcA 

An SI mapping experiment was earned out to confirm 
tb e expression of cvcA and to locate rhe transcription 
?^n. The A::p7!8l restriction fragment, covering 

~ 561 to 143 in cvcA * was isoIatcd arul 5'-end- 
"Wled After iiybridization of the labelled fragment Lo 

Wyad^iylated RNA isolated from developing pea 

^tylcdons. the: nucleic acids were treated with SI 

Uc] tusc and analysed by gel electrophoresis. Results are 

*'°*n m Fig. Ma). Protected fragments of 1 39 1 50 bases 

J-rc obtained, suggesting that jn mRNA had identical 

^quence with the probe from base 143 in cvcA to a 

7?on 24 35 ba.scs 5' to the ATG start codon. The base 

^ijjnated ' + I ' was that giving the most intense band 

the SI mapping assay, i.e. the underlined base in 

£ e protected ;equcnce region. CATCATCTAAAG. 

ed fragments extending to the A base? in the 

Rs ciimis transcription start sequences -CATC- [31] in 

c above region were observed, but gave less inrense 

^ :n the SI mapping assay. Control experiments 

V 



with no RNA present gave no protected fragment. A 
farther SI mapping experiment, with the Nsfi-EcoRV 
restriction fragment, covering bases -382 to 257 in 
crcA, gave protected fragments ending in the region - 8 
to 4-2. In this case both the SI mapping assay and its 
control with no RNA present gave protected fragments 
corresponding in length to the original probe. 

The developmental expression of convicilin genes wa* 
also studied by hybridization of part of the sequence of 
this £enc to total RNA prepared from pea cotyledons at 
dilVcrcnt stages of seed development. The probe fragment 
wa.s chosen to include only the 5'-end of the coding 
sequence of the gene to avoid cross-hybriduaLion to 
vicilin mRNA species. Pea cotyledon RNA was 
glyoxnlnted. size-fractionated by electrophoresis and 
blotted on to nitrocellulose before hybridization to the 
$sr\-BglU (bases -176 Lo 462) fragment of ax A. 
labelled by nick translation. The results of this experiment 
are shown in Fig. 3(£). The probe hybridized to two 
bands of similar mobility on the Northern blot, cor- 
responding to mRNA species of approx. 2650 and 2500 
bases; the larger of the two species consistently gave 
;i moK intense hybridization signal, the ratio of the 
imcgrjted peak arc;is of the two bands bem$ approx. 
3: 1 (*0.7)in all tracks. No evidence of hybridization to 
vicilin mRNA species, which have been previously 
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I -56 1 ) . 56 rACCTTSAAACTT : AA TC f AG TACAAAC M r T T A : A TCAAACAG I GAAA*i r^r 



CvcA rAAITASTAAT6A6*rTGTTTCACArGCA6AAGBACCGATCflArCftAQTTrrC7T6CTTC6CC6TAArTTTA7GITrAf f T6 T T TC AC TACAAG TSGC T TCfl T TC A T AAGl A r 7 AA F t r A ^ 



CvcA TGCATAAAAAATTAACGAATTfAAATAr TAAASAA TTTCCAGATTCA T TCAT TCATCAAAT 7AAAACAAAAAACT TBACATAATCATGGTCA TGArCGCCSCA I3CA f 6 T AAATCTAAAA . 



to 



CvcA QAACT66ACCCCAAACTCCAT6AAT6AAAftCAC jTACAACCAATGTGTCACACAT$CAGCTCAAAATAA TCAACAACTCAACCC5CBAGCTCATCCBCACC T7TC T AACAGT TAC^^TAc 

CvcA AACTCAGT TGCCACCTCTArTTTGTTCA ft TCAACACTCGTCAAGTTACATQACACAArSTACCCAAATGACCATCCACST TA T rCATCTAGCTTTACGT TAT CACTATAAATATCCT * r 
(TATA BOO.,. 

CvcA TCAAC i I AArCTTTTArTTCATCATCTAAAGTTC6AACTAGT5AAATACAAATCATGSC6ACCACT6TCAAATCACGATTrcCACTTrT6TTGTTrCT6GGAArTATATTCCTGGCTTC^ ^ 
M * <N ATTVKSRFPLLLFL6IIFLAS 



CvcA 6TtT6C5TCACTTATGCCAATTAC£ATGAA9STTCAGAAACTAGG6TACCG5GACAAA6AGAAAGA9£TCGCCAASA6GGTGAGAAAGAAGAAAAACGC[ATG&A0AAT66AGACCTTCA 2M 
A. A. VCVTYA:W¥DEGSETRVP6QR£RGRQESEk£EKHHSEWHPS 



CvcA TAT&AAAA66AA6AACATSAAGAAGAGAAGCA6AAATATC5ATATCAACGTBAAAAGAAGGAACAAAAGGAAGTTCAACCT66AC6T6A6A6AT66BAAAGAGAGGAAGATGAGGAACAA JJt 
A. A. YEKEEHE£EKQKYRY9REKK£fllCEv , QP5REf»KEflEEDEEG 

♦ tt 1 ♦ ♦ 

C^CA GrAGA6GAAGAGT6GAGA66AAGTCAAC6TC6TGAA6ATCCC6AA6AAAE66CAABG[ITAAG6CATA6ABA6GA6A6AACAAAAAGAGACA6AC6CCATCAACGTGAA6GAGAfiGA6GAA W 
A, A, VEEEnRGSORREDPEERARLRHKEERTICR[)RRHQREGEE£ 



CvcA GAAAGATCTTCAGAATCACAAGAACACA6AAACCCC7TTTTATTCAAGTCTAACAAGTTTCTGACACTCTTT5AAAACGAAAACGGTCACATTCGTCGCCTTCAAAGGTTCGACAAACGT 571 
A. A. ERSSESOEHRNPFLFKSKXFLTLFEMENG HI RRLORFDKR 



CvcA rCA6ACTTATTTGAAAATCTCCAAAACTATCSTCTT6TGGAATATA6ASCCAAACCCCACACCArCTTCCTTCCTCAACACATAGATGCTGACTTAATCCTTGTAGTCCTCAATGSTAAT W 
A. A. SDIFENLQNYRLVEYRAKPHTIFLPQHIDADLIIVVLN <— - 



CvcA TTTTATTTTTCTTTAl AATaTTaTTaT AACAAATCCTTT6T TI Fa7AGAAATaTSACTATTTAAAATAAAAATATTGTAAA6T tCAAT T TA6CAATATT TCT TA FATTTCAIAATT AAAT a» 
A. A. IVS-t - 



CvcA rTTGfiTBTGGTTAATATGTACTACAGGGAAAGCCATATTGACAGTGTTGA6TCCCAATGATA6AAACTCCTArAATCTT6A6C6l66I6ATACCATCAAAAT-CCCGCAGGAACAACATC W 

A. A, >G r A [ I I V I S P N 0 R N S Y * t E S 6 0 T 1 K 1 ? A G I T S 

+ D/N ♦ ♦♦- 

CvcA TTArCiABTTAACCAABATSATGAAGAAGArCTTAGAGTGSTGGATTTTGTAATACCCGTGAACAGGCCTGGTAAATTFGAGGTAATAATrTTCTTCACAAGTTTTAnCTTfllTAAfiAC lW 
A, A. YlVNQDDEEOlRVVDfVlPVNAPGKFt < - 

CvcA ACCTCCTTTATATTACTTTTATTTrTGTCTGGATT5AAAAATTGATCTTTATCCTTCTTCTACAGGCTTTTGGTCTATCT6AAAATAAAAACCAAfftCnAC5AGGATTCAGCAAGAACA ll? J 
A. A. IVS-2 -— > AFGL5ENKN9yiPGfSKM 



CvcA rTTTA6AG6CCTCCTTAAATGTAABTATGCAAACAACACArGTTAAGCTTAAIGTGTGATTTTTITTATCAAATGTTrAATAAATATAACCATTATAATTrGTATCCTfTTTCGCAGATA !?' 
A. A. I L S A 5 L N < — --IVS-J 



CvcA rAGACTAAATACGAGACCATAGAGAAGGTTC TFT f A6ftA6AGCAAGABAAAAAACCACAACAATTGA6AGATCGGAAAAGGACACAACAAGG6GAAGAAABA6ACGCAA' AATCAAAGl^ l' ! 
A. A. --> I K Y E T [ E K V L L E E Q E K K P Q 0 L S Q R K R T 0 G G E E S D A ! I K v 
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■A'AfCCT 
301). f . 



A S 



M " B " L E 1 L G 1 ' N E Q » E fl E D * K E R N N E V Q fl Y E A R I S P G D v 7 



AAGGAAA 1774 
G < 



FCAT I8V4 
I I 



* - . 



f? TITJT^^TIT??^™'™"^ 20| , 
" H P V ' 1 5 * s 5 1 L N I L 6 f S I N A K N N Q R N f l s < 

AnASITA*IAAnriCGATrAAAI6A6AAATAIIT5flAI6irsr9rTTCTAATnGG66AlrGAAAAr!r8AA£6ATCBGArGACAAIGTGATAflGCCAAATA5A(l6AICCAGrAAAGS ?l 34 

Ivs ' 5 >6 50DNV1SOIEMPVK 

M - E 1 fFPSSSOfVKRL I K M « » 0 S H f A 5 A E P £ 0 X E E E S 0 R K R S 



r l .i 3 V L D 5 F f t), 

C«A CTTACnATTGAGCCCCACTTTTCTATACWATAAATAAATAATrAATAAAACTTSISCTTTTT 2w 
< PolyA* >,..<PolyAO 



CkA 



CvcA 



AAA5ACTAT66ATTCAAT8AA&6AATTTTrAAAATT6TTTrTAATAAT5GITArTGGyTGTGrTATTATTCTAATGGrTCAT^ACATSCA€TCCCrTACTCU6f 



AJTAGTTGCUTAAT 2614 



Tr6CTT[WTTTBTTTATGTTTTTATATCTTTrCTTTAAATTAAAAAATTGGAAGTGTTTT6TAATTTST&AGrTAA6ACGAS6T 



TBTBCAArTTCTITTCTCTCTAGA ?723 



2. SequeMx of gene ^ ('CvcA'), with the predicted sequence of uk coovicllm precursor polypeptide (' A.A.") 

1 ££m Aiv ° f cIeav;Hec r of th0 kader *qucnce « indicated by a colon (:). The base designated + I is indited bv a 
} " ? 'T enCC fcalurw ;irc as ,ndlcated <>n ihe Figure. The ^-terminal .sequence determined for convicilin and 

ih* S r?[ C ° nVl ° m trypUC PCPU<1CS ' arc inClicalcd by double and 5in « le "ndcrlining. respectively; vertical lines indicate 
me termini or ihc peptides. 



*mificd as ±pprox. 1700 bases in size [33], was obtained, 
lowing thai the probe was specific for convicilin mRNA 
JPecics. The relative intensities of the hybridizing bands 
'torn different developmental stages show that the 
Proportion of convicilin mRNA species in total RNA 
^creases as cotyledon expansion proceeds, to a maximum 
^1M8 days after flowering, and decreases thereafter, 
ik? peak in conviciIin mRNA levels agrees with previous 
Observations that convicilin synthesis is maximal during 
^ second half of cotyledon expansion [34]. 

tybr dization to genomic DNA 

p ea genomic DNA from cv?, Fekham First and Dark 
ginned Perfection wa^ digested with various restriction 
Byrnes, sizc-fractionalcd by agarose-gei electrophoresis 

V 251 



and blotted on to nitrocellulose. The blots were then 
hybridized with the labelled convicilin specific probe 
(Sstl-BglU; bases - 176 to 462) described above. Results 
are shown in Fig. 4. The two Cultivars gave identical 
band patterns in all restriction digests made. Digests with 
EcoRJ gave two bands, one of approx. 13 kb, corres- 
ponding to the EcoRl fragment in pJc 4-100, and one 
of approx. 9.0 kb, corresponding to the EcoKl fragment 
previously identified as hybridizing to the convicilin 
cDNA species pCD 59 and pCD 75 [5]. Both these bands 
were present at an indicated level of approx. one copy per 
haploid genome, as shown by a reconstruction assay 
where gene copy equivalents of pJC 4- 1 00 were hybridized 
on the same filters. Al! other restriction digests gave two 
or more hybridizing bands, consistent with the restriction 
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Fig. 3. Expression of convicilin gene cvc/< 
(a.» SI mapping experiment io locate the transcription start in arc* The protected fragment is run in track S- other track, 
». Northern blot showing hybridization of Sjf I-jfc/n probe (bases ^ 176 to 462) from cvcA to total RNaIS ffo 

^^^^on^ t ( r flClC *\ 1° *? J - (traCk ?) and 22 da ' r < track *>■ Und <* condition* (i< 

^ r«e^ ^ from 7-« d , a . f . I0 21-22 d.a.f. [24.32|. A 10 „g portion of total RNA was loadrf 

rwi oTthe orfena S ''"'"P* 0 ™* The ™<>lccular-si*: scale is taken from standard RNA > P ccies (nbosomal RNAj) 



map of (see Fig. I ), at intensities consistent with the 
conclusion that two convicilin series were present per 
napimd genome, in agreement with previous reports [61. 

DISCUSSION 
Coding sequence 

The amino acid sequences predicted by cvcA and 
lound tor convicilin, confirm ihc presence of a 'leader' 
sequence on ;he precursor polypeptide, as had been 
previously suggested by translation experiments in vitro 
[35]. The sequence for the mature polypeptide predicted 
by cv:-A is then in good agreement with the amino acid 
composition of convicilin. as shown in Table I The 
presence of one methionine residue in the mature 
polypeptide is correctly predicted bv cvcA and its 
posicnn (amino acid .188) is consistent with the observed 
result:* of CN Br cleavage of convicilin. which generate* 
two fragments of approx. 55000 and 15 000 M„ [I J. 



Despite the evidence that rvcA is a convicilin gene and 
that it is expressed, it differs in its sequence from the 
convicilin cDNA identified by Domoney & Casey (4 
which was used to select the genomic clone contain^ 
cvcA. The overall homology between the two sequence 
is 94% over 590 corresponding bases The man" 
difference between the two sequences is a deletion of U 
nucleotides (six amino nods) in pCD59 relative to cock 
corresponding to a region near the hypothetical *:/ 
subunit processing site m vicilm [26]. There are also ■ 
number of conservative amino acid substitutions in * 
remainder of the sequence (not shown). These sequel 
differences are sufficient to account for the prcvio* 
observation [5] that pCD 59 hybridi?ed to only one of* 
two convicilin genes detected by the cvcA probe in* 
present study. The data suggest that pCD 59 K-presetf 
the second conv lc i]in £cnc detected bv hybridisation* 
genomic DNA. ntlS. which is thus sho wn to 9 
functional. When pCD 59 was hybridized to RNA ffOj 
developing pea cotyledons [5]. only one band was detect] 
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Flf. 4, Sonthern blot showing hybridization of Sjfl-/fe/ll probe (bases - 176 to 462) frxtm cvcA to restriction digests of genomic DNA 
from lines FeJtham First (f ) and Dark Skinned Perfection (D) 

A 10 A g portion of DNA was loaded per track on the original gel electrophoresis. Restriction enzymes used were as follows - A 
and H, fcaPJ; D and E, B$t\\- F and G, BamH\- % H and I EcoRV. The blot is calibrated with gene equivalent amounts [33] 
ot digested pJC4-l0(); the indicated copy numbers per Diploid genome are given above tracks C, J. K and L. Tracks A-C are 
irom a different gel to the remainder. The molecular-size scale is from restriction digests of standard DNA species run on the 
original gels. ' 
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on a ' Northern * blot, as opposed fo the two detected 
by the cvcA probe, suggesting that cue A and ctc£ each 
j?vcs rise to a distinct mRNA species. Further data will 
necessary .o confirm this conclusion. 

Homology uith vicilin, A dot-matrix comparison of thc 
Polypeptide sequences predicted for convicilin, and tor a 
Win 50000- M T polypeptide is given in Fig. 5. The 
faiences are strongly homologous over mosL of their 
'"gth. with short areas of low homology apparent at 
^gions corresponding to thc sequences around the 
Putative a;/? and fi\y subunit processing sites in vicilin. 
'tee areas huve previously been noted as beine of low 
^ornology when pea vicilin polypeptides are compared 
J 1 ™ those from different species [28], The major 
Terence between the two sequences is apparent as a 
, r E^ insertion in thc convieilm sequence near its ;V- 
^niinus. corresponding to sequence being inserted 
^ccn ammo acids 3 and 6 of the mature vicilin 
Wl ypcptidc. Homology over the region -3 to * 3 is 



weak at the amino acid level, but significant at the 
nucleotide level; outside this region, and the insertion, 
homology is strong in both directions (see Fig. 5). The 
convicilin leader sequence is homologous with that in 
vicilin. but not to leader sequences in other seed proteins 
(results not shown), showing that the extra sequence in 
convicilin represents 3n insertion into a vicilin gene 
rather that a 5' addition to it. The strong homology of 
convicilin with vicilin outside the inserted sequence 
accounts for the overall similarity in properties between 
the two proteins and their antigenic similarity [1]; it 
would also account for their ability to form molecules 
containing polypeptides of both vicilin and convicilin. 

Thc homology in amino acid and corresponding 
nucleotide sequences between cvcA and vicilin genes in 
pea (results not shown; homology ar the nucleotide level 
between the violin cDNA pAD2. 1 [29] and corresponding 
sequence regions in err A is 7 U0 , ; ) shows that thc rv< A 
gene should be regarded as belonging u> a sub-family of 
the vicilin gene family: this designation supports both 
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T«W« |. Amino add competition »f co^idlin; com P ,ri*.n of 
predicted and experimental compositions 
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previous views [hat eonvicilin w.is distinct f-. h , 
was essentially the same as [3!. viciin. 1 ll W 



Annnu acid 

O 

N 
T 

S 
H 

Q 
p 

G 

A 

C 
V 

M 
f 

L 
Y 
K 
W 
K 
H 
R 



Residue*; 
predicted 



Composition (mol/100 niol; 
rVcdicied Found* 



2.1 
16 
13 
40 
80 
33 
25 
27 
18 
1 

27 
! 

24 
49 
15 
20 
3 
43 
12 
53 

* From [|], 

t ND, not determined. 



>» 



) 



1 13 



10.37 


1 1.64 








A. lLi 




22. OH 


460 


5,47 


4.97 


5,90 


3.31 


4,23 


0.17 


0.13 


4.97 


4.46 


0.17 


0.13 


4.42 


3.85 


9.02 


8.71 


2.76 


2.59 


3.68 


3.30 


055 


NDt 


7.92 


8.1S 


2.21 


2.22 


*-76 


8.15 



Nature of the inserted sequence in convi C ii;„ ^ 

inserted sequence in cunvicilin will be conJL ^ 
amino acids (f)4 124 or nucleotides I2| 483 a * 
ammo acid level, Lhc sequence contains a hi«h Dm the 
of charged and hydrophiiic residues (from pi ° fti °« ■ 
acids, there arc .18 glutamaic residue? 24 arcing r^ 1110 r 
and 9 lysine residues; only 10 residues" • 
hydrophobic), ji is similar in ,ts composition [Tfj 
C-lcrrnmal regions of the x-subunits encoded by h* 8 
^major* ;jnd -minor' pea legumin genes 371 ?°4 
Gatehouse & D Bown, unpublished work) but * 
actual amino acid sequences are not significantly ' 
ogous when compared by a dot-matrix homology 2l J 
(results not shown). This add.tional sequent • 1 
presumably responsible for the differences <n nhvL* 
properties between vicilin and convicilin, e * *i tt E? 
and binding to hydroxyapau'te [Ij. The predicted v 
values for the mature eonvicilin polypeptide and its v 
terminal CNBr fragment, are not 1,1 complete weem^ ! 
with those observed on SDS/polyacrylamSS 
electrophoresis. This d.scrcpancy is a connect 
abnormal migration on electrophoresis, possibly due to 
the atypical amino acid composition of these poIype D ud« 
caused by the 'inserted* sequence 
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tU nucleotide level, ihc inserted sequence is A tG 
ric j Ii again like the C-lcrminal regions of legumin 
rS ubuniis; however, overall homology of nucleotide 
jequence m these regions j* not more than marginally 
5ignifi canL hy dot-mam* comparison. No inirons are 
present in the inserted sequence. There is no evidence of 
inverted repeals ai the ends of the inserted sequence, nor 
strong evidence for direct repeats in or near the sequence 
JrTl ino * itself (results rwi shown). The origin of this sequence is 
°* ,d «es t therefore unclear; it may represent a sequence inserted 
lrop '5ly by a transposabte element or by some other mechanism 

hoth ReUtionMip to victim-family gents in other species 

The relationships of ihc coding sequences of violins 
jp pea. Phaxealus vulgaris (phaseolin) and soya bean 
(conglycinin) h^ve been extensively analysed, and part of 
the coding sequence of" conviciiin has been shown to be 
homologous with those of phaseolin and conglycinin 
pg], Both conviciiin and conglycinin have large inserted 
coding sequences (I2l and 1 74 amino acids respectively) 
near the jV- terminus of the mature protein, relative to 
the vicilin/pha:,colin type. The inserted sequences in 
wmeni j CO nvicilin and conglycinin also show similarity at the 
llUe "gcl nucleotide level in that both sequences arc A + G-rich 
■ncc of However, the inserted sequences in the two genes are not 
uue to 

ptide$ 



! ' : J- A. 
^ th« 

r.v plot 
is 

'^ysicaj 
ubility 
ed M r 

1 its .v! 



significantly homologous n\ eiLher thclTmino acid or the 
nucleotide sequence level. Further, the remaining coding 
sequences of the two genes, although homologous, are 
less homologous with each orher than conviciiin in pea is 
with pea viciltn. sug^ling that the divergence of the pea 
gene sub-families tuok place after the separation of pea 
and soya bean as species, ff this is the case, the insertion 
events were independent of each other Further analysis 
of other storage-protein gene sequences (results not 
shown) suggests thai the insertion of hydrophilic. 
predominantly acidic, amino acid sequence regions is a 
frequent mechanism of storage prntem mutation in 
legumes. 

The flanking sequences 

3' Flanking sequence. The V flanking sequence of tvcA 
does not show any unusual features when compared with 
other plant storage-protein genes. 

5' Flanking sequence. Features of potential interest in 
the 5' flanking sequence of cvcA were shown by dot- 
matrix sequence comparisons between this gene and 
other plant storage-protein genes, Comparisons of the 5' 
flanking sequence of cvcA with those of conglycinin and 
phaseolin genes show three areas of sequence con- 



' V<cilin-box ' region 

Pvu phos b 
Gm<3 cgJy o 
Pu CVCA 



:v( 106) 

CCiSCCACCTCAATT'fC-TTCACTTCAACACACBTCAACCTGCftT 

: v cas ) 

CC :GCCACCTCATTTTTGTTTATTTCAACACCCGrCAAAC I SPAT 

TTrGCCACCTCTATTTTGTTCATTTCAACACTCGTCAAGTTACAT 
;******«« *** ** * *****x»« ««**** * *** 

~ (distance to 'TATA ' box) 



AT 



6A 



Upstream rtyion 1 







■ v ( 1 RO ) : 






: TCACCCATCTCAACCC: 


ACAC 


3ms cqly a' 




' v(l93) : 




CAT 


TCAC-CAACTCAACCC: 


ATCA 






v ( 1 12 ) i 




P$a cvcA 


T AA ; 


tcaa-caactcaaccc : 


CCGA 






*** X* UMMvl , 





(distance to TATA' box) 



Upstream region 2 

Pvu ph*$ b 
Gmo cgty a' 

CvcA 



: v<257) 
3GC : TG« I C3AAGATCGCCGCGTCCA 

: v(2bl ) 

ABC : 1 GATCAGGATCGCCGCGTCAA 

: v(255) 
TwA : TGGTLAT GATCGCCGCATCC A 

i *** ******** t ** * 



T'6 r at g 
SAA^O£ 
TRTAAA 



" (distanca to 'TAT A ' box 
fifi 6, Putative enhancer sequences in the 5' flanking regions of cvcA 

T'.e thi 

(/•> 



.<:ven 



z three corresponding regions of high sequence homology between pea conviciiin {Pui arA) Phascolus vultiaris ph 
■i/ phas b) and soya-bean conglycinin a' {Oma c*Iy ^ jjenc 5' flanking sequences are -iven. Bases the same in 



a.teoiin b 

sequences <,rc indicated by an asterisk. Homologous regions around thc^transcripuon start "and" "he" ' TATA '^box^arJ^not 
»nown. 
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sorption besides :he 'TATA' box promoter clement 
(considered previously); the conserved regions arc sho^n 
in h« r>. 'Chore is uko A conserved region around the 
iranscripuon start, which has an obvious functional role 
■tnd a possiolc fun he.- conserved rc^on of appro* \<> 
bases, at 30 50 b;.5«s f to the ' TATA ' box. This latter 
region is not us we!! conserved or defined as other 
region j,. but docs include the putative CCA AT sequences 
ol phctbcolm and conglveinin [391 

The •v.cilin box' region [39] in all three genes is in a 
similar position (appro*. 1 00 base* j' t0 the TaTa' 
box), and .wrongly homologous; it can be divided 
into two regions. Separated by I 1-12 bases of T-rich 

^"! nCL \^A^ ion ' S a h, - hl >' "'"served Crich 
sequence GCCaC.C I Q. whereas the .V region is more 

°' 5'_ Ibnkme sequence as a who c 

mCAACACNCCrCAANNTC/ACAT). Ii has been 
suggested that this region, present also in pea vicilm 
»cnes. ,s mvolved m determining tissue-speeificity of 
expression of the gene family [39]. The other two 
conserved regions are appro*. 150-200 bases and ^0 

hV he 'I A T A ' box < " ke Lhe " Vicill « ^"^ 

£?rAKr?n J g i ,J> ' C0nscrvcd C-rich core sequence 
(CTCAACCC and GATCGCCGC respectively) and are 
associated w,th less highly conicrved sequence 1 e 
typical of the 5 /lankin £ sequence as a whole. The 

th r 1 SUCh C: " rich S ^ uenccs < lrc «tin g as 
■Z ,1 ?u f " e "P ress,on ™ v ^ advanced, and is 
supported by the observation that the • vicilin-box ' C- 

also and has been previously observed to be homologous 
with a v,rai enhancer sequence [39,40]. However, ftne- 

T! J^ yS SUCh 35 thosc ca ™ d out with the 
£X T n , ^"u C ln trans 8 en 'C Petunia plants [41] arc 
needed to test this conclusion. 

tt^V h | an £ Dr ' H H i. rano - N'tional Institute of A gro - 
S»« , R " ourc *< Tsukuha, Japan, for carrying out 
™n^. ^ Pr0lC ' n W cin 8- J"l>n Gilroy for performing 
manual protem sequenemg. and Paul Preston for skilled 
technical assistance m DNA sequencing. Wc also think 
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Hnanc al support from the Agriculture and Food Research 
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